“Calhoun 


Institutional Archive of the Naval Postgraduate School 





Calhoun: The NPS Institutional Archive 
DSpace Repository 


Theses and Dissertations l. Thesis and Dissertation Collection, all items 


1989 


Conversion of hard-copy documents to digital 
format utilizing optical scanners and optical 
storage media 


Taylor, Robert Ryland. 


Monterey, California. Naval Postgraduate School 


http://hdl.handle.net/10945/25791 


Downloaded from NPS Archive: Calhoun 


Calhoun is the Naval Postgraduate School's public access digital repository for 


| (8 D U DLE Y research materials and institutional publications created by the NPS community. 
FW T. Calhoun is named for Professor of Mathematics Guy K. Calhoun, NPS's first 


WW KNOX appointed — and published — scholarly author. 

OM LIBRARY Dudley Knox Library / Naval Postgraduate School 

411 Dyer Road / 1 University Circle 
Monterey, California USA 93943 





http://www.nps.edu/library 


o rd T d dandas. "dial 
Y . e de a aed | > 
"tad A E NC VN VITIUM M Mu 

ERI LUN i P :] CI n EUM "uw eut T eds oF piece 9^ M CE Dee" E 
L NET e be r^ terão, AA ii i ra El d Mm rait io A a Md 

7 wt A a MW EMT AA mI Ar pa PES DP ED a PS NS 

e e ME Ee e eI EARN Pete eI EIE EE SEN 

A» NS PC oh Pn oka bi a dd LE LO E g a ei Pm d 

ETUR Mr WC ET dp A A ir dapi tiii eI 

fester A da aia 


ns - 
E eta eres dd cm Do T R 

saath a AS PA eer ey bre Pi di 
PEA ea DAA RRA a ¿Pr tl sacarle AA 

ARA A oll o «8 gott AB IA I "PELLE di Em 

"qc MT Lern E VE ss 


or pact dein A SAT) PPP 
( PS E de PPT SP Lore e MT 
A AIDA AA J gen puro id PA AAA ELE d «4 " 
ire Pr OET r aT e A T E PA A tad AAA ad 7 NT 
jeje RIO prd yup md NT OW T " A à Tv | ORTOS Ene v P - É 
fi rer d pipe rSn O RIA AAN Wo Aie! rro” E | E sl a ey c tamtn 
prr EP abusus "T L 7 A a AT SS E EE o pan. mm MT mes ido E s A tp dent a aon AAA a AAA ed ous DB 
: AAA rer, A ET re, ON Te er eel PART LS PA A A E o fu^ O A T E ^ Ve Cut petro iar erm a e etri ger e A RE TT ! y A 
e An OE Cr PPT re ESP NE AD hye oper OS lb o mt d i Ieremiae an ar n Party eT EN PTD a EHE A qi npe o irri 
poe yo ur m o i € : "E AE nee thee pe oria i A ET. n E IIA IA E A! ie A A a a ES a Ti try 1 ra Sce it uA 
3 al > PR qd espe M pe ~ A a na [d = w mI - = E . yn sn ete Prius E o a [E E serian" FCR MES ad rir CERE oerte 
é pee el "odi PARA o , ota as FT WEM riri v apo "e E E pr ARTO a AAA vg AS ae IA AES O a A 
EN >. ie eher AT deae aps ac E REA , erigir der SD br wre AS laa oer pr rp 
by t AS pe 000 ADA A A lie wur E N ER | "mis hoped " TO A X m rl AA Yar ol NCAA a a m Perse er ha p rote ea UD AE Toi e e E NER E PRP Oe eee ee Ri 
PA ee cabaret O A dod y DD r A A A y NE A rq a S E mee nS PERT EE iz vei Meyer O do O a 
A aero rd tie = | hain in EP A grea ou ded d rar on Ap do DAS A een e er AA EAE Moa ode A Ae i E PE la apita A Vra Im E 
A pd Axa ee e LIO IN AAA E A "XP nc a etm tesi pe cT Tp —À] pov PE HORTUS P PA rr : 
e eoe p AS aps pry eee - ne A A a A RAN A A A E a ut ife pM RE NEUE I go mid 
M - AAA ant ofa B mme v "xD mAy Py er es 233 X "wc petes ee 2: dA piedi tei A rta 
ns 2 et A li SN sl: n recae eraot dirt AAA da e » P a. ndi 
d g e ot A E ADA ri vt a q 
ap ; irae ag ped WS 
Mare MU Li PEL c k 
ü Lb ] gero 
aas d ^A Afr Aa on Sor Atm evo e Qe 
Aso e a Pa Pme 





O od T 
Cepit T rn m pedi 
a [2d A A A mens a 
PU A ia s quse robert CATAS anti A Ee 
A A A B E E T Soi AA t.t e TO een Ww pcm mem - p 
5 Ir O I a of AA tot POT P Le aes AA Oe eee err a 2 i 
E E E m$ sott e , eptlla gt A al Cw PTS eee ^ eR s y - i As xw cm _ ~ a 
etal ieee A X PIAR ER bt è Aria TT FENE F pah aft Pe cag o A Ufo darts reat or ra 
pov Term nre Perge te feri A A bre ar deve vC ETE db gei Y 
E as AAA cana a OEC A as A pra PARA v A isi 4 "x 
M awe M C ET r A bai 
Á ; ^ OS telar 
a reed Re is eet 


Te d Pe ten e N T EX rS 2 $ 
att NL MAN A AA pa E 
tar PREÇO E A TA a 
b Tolles 
e NA 


O E n 
LAA a aasi Micra ce ipa E T de 
B dee Erw he A PAJAROS PER r 
PE» A ie eae v 
NM 
[SA aaa 
O dad 





A aie 
pear pue A PA FR, mag ott, À 
PAPI TS eae , ANT an Een PSA E yo. Pas, 2 
A teenies "vt vost Parca d " "T A 
PASAR da LE Kei ARA rtp Bo ert K 
see PEL o al rE T * " perra Liso "M "rer EI A parti wem n" bert; te 
P in PPT rae pH eni A Ae ETATE br e A aim A onia ™ A alah a E RD A 
AA ee PA e IR asa a RS Y a NN ra > ri: A ORA 
AA nd rr = a 





PAI A 
TO Mere rm PI mae ns e ope PT 
PP A AA rtr PP el ae Ld Ma md sab o 
A eae PREPA P odd pere LADA AA PAS sel PR ada A a e NS 
pp pod PORO AO A EAN pnt RN A Se rr EN TIA prai A + 
Sejana iam A PAOD PE v d A > ORTO rar ted Lira rd a AA A ; A 
jeu apat- nite et art PESC eser poo resa UL Dies AA ee AAA AA > Ferial A PI aot vi. Hebel te eror e de o 
A add pipes ono eor gor m a AA en aa Ps PA ede wee PAS ie ^o py eer ap QE vmm . or rm " Mr AA A A AS DS ape diede qr 
prese eh artt E art AA PR E der AAA aê PARA A eh, ET O A A AAA we eS www DO opta ii rra reu rds vU EH ; pt 
Era mage re orn Or Per oa T E red A rando i RA yero pibe O AOS gad Pr A pt sre E c « AS RA o ura y: 
A oto ict AA t o pug ab PEPTIDE MEDION EIE agde rwr ri A fo no de PA AA A ib ALA AA = 1 e artes q a RS e A pd ALO plo ro qe fa 
a ri rp eoa Xa Và Ed aeu den A D des bnt ples OT pli taro nd oM prota PA pem F D e rt a NA oe endi, detido A ar psdb e dra e O i 
PX PP IO eT prar sett ae PAID eit ale ror P ivre TP ie sept ee erri er Mr MAA OU DONC Eri Pd Ds ra "ww vow pee recepere Lipid pao Pula A AAA et A A A Fie PET, E peer rro peta E 
A TA RRA RARO AAN per cr AA PP AA rM esse PRE O O a ET ee OA TARA IA IMA ADA A hc tin ASA E LA A re rr Saar or ory A pi 
A nó PPEP T r P or PR PS can FR me © PAPA eT eel rp ER r r OT A tonne we ee we qn ett vo o tao Fe pe AAA A dl O A Eee do EE T e regie rie PA Ie eec uri iran a 
A natal RAS nang aby orem eni Ru EP E mer v TO AS A O A ww pc IT V irn Refs te ee a a a^ ww A rie AAN E a ARAS [won A di ODE ETC C PENA CIE CRETA 
JunpspquaPq S Ld aperti A are err Pr AR ARIAS AR eel eyed MARA ui pre Te inier V O-A PRI oh daa pr nl ciet AS A AA eas IAEA A O dd Or MERI IT IE LITT 
o gr P eq i PP PAS A | ae aad "vi PP A aod A EOS SA PA 41 ingrid pepper EE ier p 1 ET bé aua P cde t PITT: AMA ee A AS do er RA pe Seria A ior t AA A e 
a page P atm o od d Ps P ado Yo epe PT ae NT ES a be AL PES T an c d PO RA AAA ia perra Fever RE Prep) L| ra 4 DIA AI ial PA A rein » a AAA Sera E a A DENAN MAPY Aic drei Cr NA R ed eS al eh) UR a ea chen gar 
A Sed A o nd paper A s Fir Mat tet n Iia ed m mnt t A A AOS C aspri Sede ua * $ L Pa ats As "yo we awe m TT mcer AS puras PA Fr urne Qd ADN A Deed T E op dal A PA Bain 
A ated ra re FIA A otiosa AA A puc EB arp AAA A a A A - y —— A a) PARA EA PAE teh PA n odii. MINA o Po ms dr ARENA AA unió E aa adedi daea A A IS erat m 
errasse dh I ad a aes Peper Sete a PT ONT Te Te Si A A le mre pube E E a RAR es pibel ind E] AA O | Rabat amo A caet RA pbi ARENA rere PA ia AS A kant E ale ado O a A dr dre - 
dide dai ESG do mn A P m ar LA Ar PARAR al de? ASIA T" PARRA cited A eso Pr q " "P. ED m osos A RR] mad pu va "v PAPAS ri A A A A pan a PE renta ieee be ARA A TEA CRM SEE EIE 
rel ad rud atom do o nal P evo A A ee bees O PIN e pese a PP Pn sions om uod aso ONE Fon Ma aeq oA Ra ORARE. Fein a "NY ed ont ut mt a m 1 A hie pa > AP A TE E nerd EA A Mg UPA Pa E era Srey tay Peper tri te 
ee o nai SR paite Mgr ia HAEC bd Td d'a dot P PRO nios a ss RAR TW me PATA "nan = RT prs pepper TTL E mq hs Bette Sw ME " ere AIS ad DM rd CH ind A a 
"iiec q DO A cd cia peto M re Bota PRA aes nry à AAA PLA AD 0% o AS ri pj. AU RRA "Y " B m ins - MET WI EE a A Rc. pisei RE d nec 
PRA peri ind Bs ata nut od © t à pP Roni AAA SIDA O Dag e aa rd A n 7 2 9 Eh - voco e a cla qe er a A AS A es EA dar pra T RN A Erden cir e os 
LELLI eI A ED O TE A pe E N P P ra Te ae Aar aa ap nd z L m CE - ei tio aM Se cela e tiere d MT Lathe A ere dp e Se Re Ter IRR I eir eren Ee M IE E ul 
PP sogro papi at se side A opel Li ADA PA n 2 J^ pr tr - -= " A ES bs A peter a E rd rtr e? Eadcm debe TON OLE ra RR 
E OTP AA lisa dii PA ad punt c PEE Pr "I nd agite it A A pati il dcs Meo fe o alf sul L - A À Ps od rmt ias EA DO nd NESTES lr on ARTE gt et crer A pS pn 
SARDINA bla AAA PP i A T d Kae pater AT a A O amb a, 9 = Mos od ure dro ap Jus tf ty tre Ata post mr tad ha tad ore ER d an d a A. IESU MUN EI EE 
a de pç sew PA Y *34 AA diu E a pa A ad Sart PA peor de tot » "m " A en ad ee er n eer brote tene t S AM di EA RAS a TST d ao 
iurare rtr plecti doi Pe rr E o A A A E rn qe rir or sig ar on e Mw i AAA wae. PAI A eee A ei ies M pai PR o itr ET Quir aa iP pi DAI arit mir E 
e a AD icri ta E o A rarr a o M E i A eli d sme j Ss athe PED e A inui E AT li E eir E RR nat 
tero A A PARRES eee A aai AS dein AAA ad IRA A Ra LAA ns ell apr S at M " - ERN Pc ra ay Me crap ¡a Pes ANS A Ad or prd aa Papi PT art dir 
—— a aiad O A a da MIA dd Dj MA T Ld bet 2445 od, TS NE Y don MT EE . AA A A "ul P POR TO DS ond Py HEU WE mere m pde va o PR TA a A NS e ir de adye epe epe OX Aer. A te Af ed AE arie 
d onapo. t o AAA A A PP aapi le PT teme d us PUE d PARAS ainda gno srt eot LTDA ep rur om! ant nou e) PE ripper pr DA AAN PERA E: "ejr iue errem" Hcr ndr RR A TEE EA 
Lao NL anal pepe Ora ret PANTANOS AA II ead no IA mi PE id E rc EL: A A S A Tdi ln FUN do " eire e HR MICI ie sd pop C s AA La a E baie RR ARCE 
aped etg prar or CTET E ord e Lao abria te E PAE LOE GE FEMS LL din p P nid A A a EA d PLZ 2 "nm. [we ÃO "TUUM ron EN P peche a kgh eee AA PAS A t tb . pro paes ewe Dad PS erp Ter 
A er] A piger ar Pena Pih rto E ade «ird ri e paar qe Run d AAA TO En th e n E A eee vu uro edd A Meier S oe M EE A diria Apr o 
aan. SAS ad A e ss. PAPA ei ne RA A f a& P Ure AA an PENA NTE lis Fehrs ort « - "m re P " TM e odds pst pde AAA psg died "Alta taro afe ce? id di i seo ie AS pa e o ASNO DAA UI e PRP A ee pope 
AA E A PRA PUT sadi e de P pardo ge p E ahead pP NW UP ais) TM e uad Dm mm ret mee -— -å p pen ripe Ep sanae €. AOE TSO dedii A pr er ari e petes ete Her rige zw pez merecidos A kou 
Eu - > A o ai Ku et at o apo pu P AA "TN sias PARA PP i bnew yere APPLE iod Pp RIA DGA pep NTE Te 3 R ama WINE Pv A IV ri EA A a METAS O UY EA A A A 
ee? AA me SO m fae t AAA E SÓ A a tar Pm NL AS PAI ITA OU] 10d E wwe nue" - A eqq tcc dai dd put rS Rete de debe gr erint e Ta ig o tri ça A A epi 
arepo PARA e AA O tel py ao 0 an ri AAA cad Pa LT Tad otal A FI IO A ad ^ E Sada mens Seir diii OW a Vi Lm prt MOT CEN Hp rante ig RODE PTP TE E ros RP RT SA ARRAES a oops nn 
nna Er uere alg ym ab sr A ip sta pe E E ES APIO e e ad A E PEA A hae n nap orco E ES MA, adem PE C df aei Mai m^ q e EA enr Me rer MA Na IT ad O 
MARA A art IIA, RA A AAA E dean Ped ind i pm e bulis A anda AAA SA a A A A pt ml Pd eS x rr s IAN PS SA a E a ER E PAE and Roi 
y A Eae id bd AAA La A ATA St IS ruis chi) irai 227 oli P ratur "M gare E rd A ea A b — — "e wem A O O a Minado P E A Lr enm [ELIT POL LI MM ea ac ed 
prio O eg ECA ALAN O MAO A PE O Pe hee ad e SA sapos a PAS A A el Pi cada AR SS end Pu "n PR e G E dec RA ud A E e A A E ad E E RM EA ETT 
perm ODP mont pr AR T d patre rid E Pee Peer E Roa peers A iR eA pepe qum Een oon PE e why te? ae A LP à. QA A o PR E ore dee wer eer a dy asi dO og edes ertt train tr Seer PRETIO EIE REA DeL Pel 
PIDA IAS ARA BS A. PAE A PPA a E NA bit. A A PA PR ptg senio ere t E = m a: > cur pt ia ei ht jd pareri e ers he REN ciii ue ct PTE rele eee Meteor as A A HD 
AAA gigi revo des La .. PAPA A A A cotes ncn o AA bi AN A d A A OM ere » Y a IN -— p qepe eer o DA o pr hc A eel pad bg ei 4 pos MEME E A e a nie a 
dete AO AAA AIN A, PA di AAA E ovid E o a PIT bester? B^ PA eR pert Vr d d - panin m A RAR e RA degree preço al A TITE rs Prem Noa EUR T DRE Dee a eR Er ELE 
prse eit rr e n X prre Ot a Di An. sieut A ee PRA NT AA a PEE ad TA NOI E More qb a a AA E T rdg" pS os 
ay ASO AD OB AD POOF Tr PIAR C ibl fao AS n AAA SS a "T 2» vio "o DA "d aat Piper ES M aufert qt Cao PA PR Rye aedi Nw E TS linda PA ia E Lia acd cirea ana ira 1 nd ig 
di em. adii heel PARES A ón AA A at Bf o Paul e etn PORTO Ea: O un o "i = PET A PA wi, | E PA ARA "v q mm res pan et YN AL. Pa AU ACT T ae "TEN qp ene pec mr a9 Pata Ace a e V P dh Pope pd dd O d 
i-e dd qiioque o pr EAT Joni Sto FPE EL ad, A ae en A O IN tps ae wn "T D tw [Es PS re tar qu paier efc pam A O a AAA EET ae a ly tip EA Los e a rd ri 
INBA AMA pi nr AA inso a rg Mis 4 ar ai PI Da Bs ON es chee eel ae oa P. ¿es? P * * tar m dieque ec qM v A A ETAT pr Al A aic me MARRE E IUDA EET CETT DIEA TET TEE 
uu a de P Ld d Vm att tta te. PRES ees el PEPP EN I PEO ind AE A Bs pto TE A— dent ded cn mn PUT E meri "UY INL A » do = Tov - "^ EN w ge S^ ^ ofa IAE PPP a | p ipe qoe erm E Me Bes ^ en a errs Nem enr yd Seer A MO Sei REA I AERA pre A Pe 
US o sia y SA EDS AAA IO PA ipa aea A a uh AAN A A d ef ud aua dm * wr D a MON wong mm Ae A a TS A vss eal Ti HUIR ee al cio Medid E a Pie A Minis 
Sd e pt a ASA AAA A do di PO ee Re RE Ppt reaper De o? T" Tr - AITANA AS RRA ebrei PAR ad AQ uie ri e panni M era dpi trii i ita d 
Poo de Pdl eth iul podido To i ee O nda sing AAA ett fas eun j MT be ” " Purto UA , a pm AAA An A an to ndo ear PAS PARAR ee dae aor iust ep nly T or qp pe aS ere a E 
pie uui id AAA gta der ul ol A AS lant te AAA et p: M PT WEE gd tat dioi qum merid od rw ^ oed Peet ee: Pa A pee e cer AST E EE S DA oan tp JOPAR te en Re OU REESE EL URN I TETTE REI eae 
A aati PRA PE AA E irae rei oP ry qo Um "Abc E SPI Rer . AA aad i no tua "WS r a pm p "eit ner db ALAS AS Miror PT ae TE Aar RIA A IN a 
Pi dei PA E PEE i s d. ao Ad 6, 0 nr ao^, AT TETT E pas qr i ii, Mm ve uv I uer p, r P T m AM EAE x F a y eee i n Mei epe ES e inm era eer M D n" arena aea RR Dept rci B Nm pertice peri tre di gp 
f^ t dd y eom AA A ai. A AAA ate rins bed NP IRE EI m PP n -a i Sa T" " now Ww Be pem = q 2-4. rt AA Me SES hib deret PA CR recuerde qnem ESAE e Va ac o TA a Se 
d erasa sv Pe donna ares EE ded at wate srl ETR EL od doe ae et ÇÕ Pe RS ad ai Gadi Lao Mr + br a Pue - E di: Pen XT A pent vey ati ithe bse te aon PEE EE E EIE 
paga dite aaa A al PP oh Ao? o PEE di ide wr ret pap A PAR DR, AD PEE P m £P "uper Cs Dr py APA iia io un " = pç = T pç Peq apo LO A d x di Beda dbi a e. m a erwin dip tede pc de Mur metn decipere a re geri 
ro om o ant a SA epa uei ARA AS if ehe t Aun PI LR ado MEINE pd AUR deb m ai J poet s eee Ut a " IA AU PI O bad MA ndo we A tte: [T Lun in crede p A Ap EN sides 
—€À— quibue rl APRA SAA DAS ia O O E d y rii A O tae ae pea m le a PSA P a AA T T " P — "Y MY a: A A > AUN ee LL aaa nae AP o ta E E ee A gra a repr der tando are 
PRA Edi DAI di Ud fps rte PPP. dum maltose AIR an iP Poblet A A M EN Mt a Faas me ^U "i A pw ere jene Pipes idi new ip ^s ttl Piper PY apra dr pore MP eiie d Vere named O MI 
pub el IA ctl PAD rni P veu usqne alt. mter el A ati casu a tt : PA se ^. "wn " x " AE ipd rtm A Bata Ww aie a S TN pararon e E dia e qr a SEDED pum pas ripara ue ell LL a A E dai 
ADA AAA PA tad PARA A ae pw A ad & mud arro "m . 2 va AA, S [or NW a LT ADA AS Y is stent 0 AAA ps pp e ^ ue iie EE E A ptite dne rt A t 
puppi ARA A AAA beara AAA pet A AA dd ARE IS DA 4 s` m a PE nd gp wer po P - 4 xd Mind he 8 LN acteur certc circ ads Ran de A pat Prec lier 
A My EAB BLE AA E di e AA PA did PRA? Pye Edo A died PIP S ats E L] T Im A m [zs - ^ rv Yl P pen decile doe AENA ARAS AAN AR AE TO NA ppa fp pt d ppm did 
yia Teig pi e ries eget eon a A ai rt td aedes ten ri. An arat Aime 0 uk te m Eu v $ wun a Fast e rd tO OS E aaa [Mp ds De pci eA LET rapi dod parado E 
adia lle d AR do PA Pe PIO IT O rapidi AAA a "i Ro gd P dad p^ pa r» e " w m q P". " she Pu. E mne AA RA pepper RECT ial a ML rab sida oc i 8% o rs NN 
III A adn PARIS A AO E natos AAA SOS "P d see "A es " " P s $ ere eran aeree PAIR v. E o ets Pata AAN A AA i ar A a rt A a ee 
AO NS nhe A E tial da esa ufum ot PL. et P a era es ser LT d A a rd a f, ^ L me "n A = o A cp A eda =p pa fas es tede dr rt A risp Pn my? e DE ado peer ee T S 
Mind a DO e EUA AS ta PRAT ES rr d aoc EFC T ETE Lan PETS nn Did P ^ -s 4 e. PA B Pr E Eq M A ae: wu "n^ "P q DU 1... mp 4 pep ae T met emat priis Pr o dido adia reaa ie ti E al 
y e Angie tme P d AAA ae A IN D A Ad PRA A op odota . [ m mi raft ^^ Pen a Pe ada bp US PC 2 AS ee AS A tado PEU diese mr A me MEE de q To mp Dep Ap A A ipii ACAD: 
patur am Pt PA A SE AA il Pad s ata ces aet LPs P dil AS ... q e o ^ a ” E EN . - x mn aov roa Pas, eT led E A br "- ADA oe te ll me T etae ES a dr rV ea Ite 
a a es pS dr sed yc PR M Pe c pa a ere "m bir we Ren TA d : qs I mea € four a capite dp Mp REP 
a ao bao urs AE sate eof mé E diet Ld ue age e P [m ce. a. e" " > tr A Marre hr AS brad E AURA dd PA al A aga rne ce das agre far sa tt P pecas 
pd ud SI di PA PO Ph pares —" P " " " ay "1 "E or a ae re a ea A Petia S AA aro a rd A 
I du DE ue ot AAA A Pad pm ty ra n c Ron ai A al al "T" » El £4 - sie i dr et P » m P aras AAA AN mr TA did died =" terei sr tra S da pica rr 
o dk didi edic mi III d phai af AS AE i E AS ra? A A A ts PER TAS [4 "ES - a s i. Pi- Xn M Ee eras Mgr rA ER perdi - A A ão 
at rio AAA A ON es 4 pu TN PT P" e Brie S) P E A Bey FETU E: AA didici Wyte Bee he ATH id AR at p e a 
PARA ba PA E a Ok IRA IA A Es MULA TD pr, PN: m " " i. A CRA maga ad ARE ni oor aN rN cate ene ee 
PE isi ad ail "Li i-a p acm "i a arma na np a uo A Pm PT Pu - ew "0E TT E e L roa + m » m A 4 ^F a eps rideo rar AIR lr A e RAR ee 
dba hti PPPE Edad PE LEE Pn ea gor; got, P, Tu Pn mL PI PT A "E "v oot ar A y pe ¢ " - 1 AE quem P ets capo pe iate deir. Apia c em NEN pi ND IS E dr ero 
rod. A a ARA ed e Au PA M ded - s "TP "m ad ” M $ E T " UI] n rA de dee apte E caet pu as ufa spei im prie See Se pe N, 
a P at der mt pap sedia Fe A AAA POTT PE TE T A ci Ed PI m s E a d nd M AT dir he AN A atte A anda di "el ineo fio d pr a ta di mad. 
A tn o Pr PLE Ld per PL ded a e ADA A MT "re Pi » - < " P r 5 EN aft A P Bs MR TO PELO A ren eae ye Bee = rm a an ha tet a A TA ee AN 
PASA A sio AAA didis Pura dies et ft FF z BEA pw nm. EN ^ e à x pen E Rm DEN asii A SE PA O oa Em 
dic: prs dd + .  *. SI y^ cm f Fr d Pli. ln . APA ad EJ j= E OP yf O sae #8 ” LA ra] L D - 9 "y Ve eant a rap dico PU. tute a rr gea eT Rl A A A O doa 
— T pg AS PL NT | PORTET aeda AAA A a en P AI OUT de 3. A n dado e E > iris erar RS CAR Dev anta e es T i 
"gei intr ahaha a yA A ii "P 5 po ° Ew "em " A PP E ang ae te ES DA o DA tel. IAS ii AR 
md odo aliii o ird PR A EL PA "T pl same rd ofa? m "m B L " 4 a ` - a < ane ary s pe LIS M. te nl a png sec a y ym tea ort Y pip ipid: 
d prre A A carecia am da A d past rl so gapar À d eh Mun om " mL ru iss peer RS ea g ~~a A O A eds RAI ea al 
"TS d E d br mr Plo d oe eh OF Lot E echar" 6] Y M E ee E EM P "n" e .s PE " 2 F 3. y M Sa = S dea plus s c plc BR AAA ante ppt dy Pee oes putes d py prd e 
da an at qn af men Mtr m SEAT teed id PP oad t "2-4 n [s e Y a " " "T" st ” Ll B : "m E a c PES hum dae an nipper O AA m- 4 picea AS a il P dvd Lethon aia 
A A dida» P AAA . 4 A "T PT m "m XT e» "ue "E v r x a * Ta = wA CINES à rro AO a > up EAS ANS a A A 
A a tiation Y dida d "Im aed 4a gu ane . LI P "Te "II se 4 da AA p " = a Pt E dpt S Pope: me 4 psi pp € 4 = DNA e e vd Mem add —— 
va ect PA PP rdc Ne = A Ar e 2 > spo - "T r " e a 24 a E A y sate Mt a "^. am bd s dne ci o A cl team n Na 
aru. apo EA Pk cundis LP rd T ITE e "m E a risk P" IE TE do | r s "TI = E "rp d cente de ay A ates ceeds minas dod dl m 
A E por nm - pe". Pp 4, PL a "ma me " PE pe ow ae ppl = * A an. e AAN ur sts A dd o di Ad ii 
an da e at auf abt o t gn atm mom t o a -—— A "X -— pw e P 2 T ge ^ . ~ ix = ww "afi e ap AAA a dba a a Pon ts en ATI 
A tido nel a T " LP. Paot o atge ~w "P E E » ^ = COE: — dedi: be Do o ee a AN 
a Po; + T = Ld - r" PIT. ew "Toa et. ... ^ E d p ^s = p È Me A A Fr pet rti Pre pr ene patriam Sec AC 
vero A Ro a Sae .. > se d rd "Ed e me " » s ans pr z Es I ier Va ipii n. d p< oil cd dr SIA q A e del 
s ui agindo nto d Pe ea : =, s Es cu E 4 ‘# ye a” E x mu ot ma ha to A A — mueve T adiu in atte nae aa 
"d Ps " or iu pa o ef — Pp sr O DO . yr o» i tee Y - TE Pa o e a pr albedo boda die e A A 
¡Ps E = = $ Eee Lo al x al P a 2 ig Ha e^ Va Lg L * = 1 UA A te o da A ag e a Pate are Pho ara tr nbl apa SP Sects 
ça "d + m "m ” "We M P A y^ nm E. CUR aim A - E ae eee m desired A tao Ai di di v A ditis A 
Pr tas al A A aai ” A A A a bi ^ .* q y # a " v wet ay E^ . M - NONU Ae aye Se hier app age pg brary Goes par s 
pa A rm re o ; - P" PA > "TL e PA » r "P e n dir = E , = Has 28 m ats V qu mrs died dir sind e Pr ngA es dudit iet 
q > e o se a Pap v rd as A " E > ^ AE ro AA SS. E "A um P ucl. š MARA ST ey Me ct A o a. ar 
win ie mma La pt - itm HA A e E A A a ll 
A RE cdi da - — - = a d mo 9. ~~ p Pando qu A ie SS ibd DO a 
Lal P acad P d Se PI A B £g. LM m ^ud > "m Lo Ld “s = " q po m pe > ^ n d Y "en mu => na m —€— = d aded -— m (— 
— e m E! "T Pg Et Pu Pp Punt p" F A > oe qu -— "m AA le: e A A e ar aio ml ran. acini did atto ue 
e mc -~ E rnm p " - . " pu E. v 2 Vect hy S E" A - "TELE e^ = e. p^r eme "eiia ai de dn 
T E M us s G "ds - "P e “ « m + ” a " 24 vx A TE a o. Ro A ex = 3 ha pa e A a — € pu 
p “o o m » " a e "m E pa eS Fun A = paru VER RA tua be AA My a Ã prn dita 
É E ^a eA e > Pd ~“ e ué E fed a . ” mm Pa rA A DL. "me 3 a». dy Mail ie ordine a e clar dia 
x. e. as . pr LI - " P E ^ "I - a n A E = ue A cA bo a e Y mueren A A a dh m 
Ff AS E ag E A e $ e e has “a nato M ES Se as M md ARA liit dla 
e = . y S ci AA . ¥ a " T e ES AA boi > 
^ " aa » " A e P ne v^ "S m A A aiea RO ia de <” 
i = = mE oo - | i Pam am O ada a o à er 3d 
x p^ r Mod " y a E A "a. tes Pu A IT inis icio AR 
i H y pa E d s. [A e À t» & um ^ Mm TL in ~ aA | T AA. its Ml apd ree 
d x e P TP 4 aun S vo um mg" * 2 era E LA rn dient 
- fe P rr E Lord - m PP , . pur £ ~ Fe « E ^u ECT ELS ind my tour eim gra a 
P ied = "LT dr . P. é E D = o =e oe - dade e tdi 
si 4 » 5 ^ dae. d * x a elite ad - = - ~ apt A neY mu "m p oce a Dae irme mt 
= eor a a y "OR ca = Pa A que mr sms ^m "he T A Lus m 
i 4 a d bd te E das EIS — i = ie seni md a 
- o P 3 E P te An PS o = % ne em 
P d af v e E. x - ^ — e idad Se ae ee ^ “> a. 
ing © = in ea s ad Ld s E fpe AN y ee AS 
E wr » P p Dis z en A ur ` -——. te) Pd 
"P - e P , " * e A "d . a non 
r " " * ^ - E — Do At E - ~ 
i e A a - = -- * a A "d 
a " a sa An a- > > n P 
^ a A. = PE r E au ^u ctn O e a D 
at di A a “a m EE dn =. - - 
< e = 2 . r " E ^a n ren sanie figs ^er -" d 
p p - E M A me - - ee dedere per > 
ea z e 2 2 xo a mre o) Li h q Ave a Sa LI o - 
"uu a > E y . "I PARAS e A e 
d ty si ” "m a po 2 E - na mad E ^-——m"- - m 
LI La E P z, e ha 2 E m End - 
= € ns "sl t, meds - 
B a e wem AE ae 
- e " 
r ma - 
E e a r 
d bd = 
- L- - x 
«a 
M = : 
m "e A Me ee, 
n" - - E 


e aa 














NAVAL POSTGRADUATE SCHOOL 


Monterey, California 





CONVERSION OF HARD-COPY DOCUMENTS 
TO DIGITAL FORMAT UTILIZINC 
OPTICAL SCANNERS AND OPTICAL STORAGE MEDIA 











by 
Robert Ryland Taylor 


March 1989 





Thesis Advisor: Barry A. Frew 


Approved for public release; distribution is unlimited 


T242*7907 


mE n 





Tew ede ee ho? do do 2 dd dd 


RITY CLASSIFICATION OF TH.5 PAGE 





REPORT DOCUMENTATION PAGE 


EPORT SECURITY CLASSIFICATION !b RESTRICTIVE MARKINGS 
meuuclassified | 


ECURITY CLASSIFICATION AUTHORITY 3 DISTRIBUTION / AVAILABILITY OF REPORT 


)ECLASSIFICATION /DOWNGRADING SCHEDULE Approved for public release; 
Distribution is Unlimited 


RFORMING ORGANIZATION REPORT NUMBER(S) 5 MONITORING ORGANIZATION REPORT NUMBER(S) 


JAME OF PERFORMING ORGANIZATION 6b OFFICE SYMBOL 


(If applicable) 
Code 37 Naval Postgraduate School 


¡DDRESS (City, State, and ZIP Code) 7b. ADORESS (City, State, and ZIP Code) 






7a NAME OF MONITORING ORGANIZATION 


Javal Postgraduate School 


lonterey, California 93943-5000 Monterey, California 93943-5000 


VAME OF FUNDING: SPONSORING 
RGANIZATION 


Bb OFFICE SYMBOL 
(if applicable) 


9 PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER 







\ODRESS (City, State, and ZIP Code) 10 SOURCE OF FUNDING NUMBERS 


PROGRAM PROJECT TAS* WORK UNIT 
ELEMENT NO NO NC ACCESSION NO 
TLE (include Security Classification) 


ONVERSION OF HARD-COPY DOCUMENTS TO DIGITAL FORMAT UTILIZING OPTICAL SCANNERS AND OPTICAL 
STORAGE MEDIA 

^ERSONAL AUTHOR(S) 

aylor, Robert Ryland 


TYPE OF REPORT 135 TIME COVERED 14 DATE OF REPORT (Year, Month, Day) [15 PAGE COUNT 
laster's Thesis FROM TO 1989 March 120 


SUPPLEMENTARY NOTATION 








The views expressed in this thesis are those of the author and do not 
reflect the official policy or position of the Department of Defense or the U.S. Government 





COSATI CODES 


FIELD GROUP SUB-GRO'.'P 


ABSTRACT (Continue on reverse if necessary and identify by block number) 

storage of hard-copy archival paper documents requires vast amount of storage space and 
me to search and retrieve. Technology exists today to convert hard-copy texts using 
tical scanners and storing in a digital formal on optical disks. This thesis conducts an 
depth current technology research of optical scanners, optical storage mediums, and 
tical information systems. Utilizing the thesis documents presently stored in the library 
oard Naval Postgraduate School, as a statistical base, this research analyzes the 
quirements to convert the thesis documents to digital format. This thesis concludes that 
. image optical information system is a viable alternate to storing hard-copy documents 
d recommends follow-on thesis research to build an in-house optical information system. 


18 SUBJECT TERMS (Continue on reverse if necessary and identify by block number) 


Optical Scanners, OCR, ICR, Image Scanners, Optical Disks, 
CD-ROM, WORM, Optical Information System 






DISTRIBUTION / AVAILABILITY OF ABSTRACT 21 ABSTRACT SECURITY CLASSIFICATION 

J UNCLASSIFIED/UNLIMITED [DD SAME aS RPT C DTIC USERS Unclassified 

NAME OF RESPONSIBLE INDIVIDUAL 22b TELEPHONE (/nclude Area Code) | 22€ OFFICE SYMBOL 
Barry A. Frew (408) 646-2924 Code 54FW 

FORM 1473, BA MAR 83 APR edition may be used until exhausted SECURITY CLASSIFICATION OF THIS PAGE 


All other editions are obsolete Q U.S Government Printing Office: 1986—606.243 


: UNCLASSIFIED 


Approved for public release; distribution is unlimited. 


Conversion of Hard-Copy Documents 
to Digital Format Utilizing 
Optical Scanners and Optical Storage Media 


by 
Robert OE 
Lieutenant Commander, United States Navy 


B.S Universi O Utah, 1977 


Submitted in partial fulfillment 
of the requirements for the degree of 


MASTER OF SCIENCE IN INFORMATION SYSTEMS 
from the 


NAVAL POSTGRADUATE SCHOOL 


MARCH 1989 
zm 


ABSTRACT 


Storage of hard-copy archival paper documents requires 
vast amount of storage space and time to search and retrieve. 
Technology exists today to convert hard-copy texts using 
optical scanners and storing in a digital formal on optical 
disks. 

This thesis conducts an indepth current technology 
research of optical scanners, optical storage mediums, and 
optical information systems. 

Utilizing the thesis documents presently stored in the 
library aboard Naval Postgraduate School, as a statistical 
base, this research analyzes the requirements to convert the 
thesis documents to digital format. 

This thesis concludes that an image optical information 
system is a viable alternate to storing hard-copy documents 
and recommends follow-on thesis research to build an in-house 


optical information system. 


WL 


je 


Tope 


TABLE OF CONTENTS 


[INTRODUCTION 

A. DISCUSSION 
Bo SCORE 

C. METHODOLOGY 


OPTICAL SCANNERS 


A. 
B 
er 


TII 


OO WwW y 


INTRODUCTION E: ^H 

ELEMENTARY CONCEPT OF SCANNERS 

INDEPTH REVIEW OF HOW SCANNERS WORK 

l. The Light Path a, Nanton 

2. Lighe signal eo Digital B SESS 

OPTICAL CHARACTER READERS 

1. What is OCR? 

2. OCR Technology 

IMAGE SCANNERS 

1. Resolution 

2. Levels of Grayscale : : $ 
COMPRESSION/DECOMPRESSION OF A SCANNED IMAGE 
1. Modified Huffman Encoding 

2. Modified Read Encoding 


OPTICAL STORAGE MEDIA 


INTRODUCTION a "m ie 

ELEMENTARY CONCEPT OF OPTICAL DISKS 

CDSROM: COMPACT DISK-READ ONLY MEMORY 

WORM: WRITE-ONCE, READ-MANY 

1. Ablative Recording 

2. Vesicular Recording 

3. Phase-Change Recording 

MAGNETO-OPTICAL DISKS: ERASABLE OPTICAL STORAGE 
SUMMARY 


iv 


ORO AS 


o an n Y 


IRA 
12 
13 
17 
qu 
19 
21 
22 
23 


23 
25 
29 
28 
30 
EL 
Sal 
32 
32 
34 


IV. OPTICAL INFORMATION PROCESSING 


A. 
Bi. 
C. 


DOCUMENT DATA PROCESSING 

OPTICAL INFORMATION SYSTEM ; 
BENEFITS OF OPTICAL INFORMATION PROCESSING 
Huge Capacity/Space Savings 

Speed of Retrieval EC 

Shared Access/Remote Availability 

File Integrity 

Archival Life 

Cross Reference Indexing 

No Head Crashes 


Oo N AO A W N Pe 


Distribution 


V. METHODOLOGY AND DATA 


VI. 


VII. 


A. INTRODUCTION "n : $5 
B OPTICAL CHARACTER READER EVALUATION 
C. IMAGE SCANNER EVALUATION 
D RESEARCH QUESTIONS 
1. What is the time required to convert text to 
data? 
2. How accurate is converting to digital 
format? 
3. What are the digital storage requirements? 
ANALYSIS ane i : : 
A. THESIS DOCUMENT STORAGE REQUIREMENTS 
B. COST ANALYSIS 


1. Optical information system cost analysis. 


2. Hard-copy to digital conversion cost analysis. 


3. Summary of Cost Analysis. 


CONCLUSIONS, RECOMMENDATIONS 
A. 
E. 


CONCLUSIONS 
RECOMMENDATIONS 


36 
36 
37 
39 
40 
40 
41 
41 
42 
42 
42 
43 


44 
44 
45 
47 
48 


48 


20 
51 


E 
52 
54 
54 
55 
56 


5i 
S 
58 


APPENDIX A: ORIGINAL PAGES USED FOR SCANNING RESEARCH 60 


APPENDIX B: RESULTS USING AN OPTICAL CHARACTER 


READER o Cv. Ras: E 72 
APPENDIX C: RESULTS USING AN IMAGE SCANNER D cn E 87 
GLOSSARY cu . caga e O oe 96 
LIST OF REFERENCES T o aa 
BIBLIOGRAPHY i2 Eu c. Lu 
DISTRIBUTION LIST M 00. SÉ NMNV 


vi 


I. INTRODUCTION 


A. DISCUSSION 
The reference room at the Knox Library aboard Naval 
Postgraduate School is reaching its maximum storage capacity 
in terms of shelving its archive documents. Technology 
exists today to convert hard-copy documents using optical 
scanners and storing in a digital format on optical disks. 
Companies are finding economic as well as technical 
virtues to optical-disk technology that justify going 
optical. Some firms can cost-justify these systems by 
the space they save versus all other data storage media. 
The value of a document may be beyond measure, but a 
Square foot of a floor space occupied by a filing 
cabinet certainly has its price. (Alter, 1988, p.18) 
Converting to digital format will require less storage 
space and provide for a faster search and retrieval 
capability. An optical-disk based information system made 
by LaserData of Lowell, Mass. was installed at the Maine 
Medical Center in Portland, Maine which allowed the hospital 
to clear an entire floor - 7,200 square feet - of a building 
dedicated to medical records and radiology records. The 


hospital was out of storage space, so they were looking to 


recapture space, rather than go out and build new space. 





Figure 1 Today's paper filing system. ("Wang Laboratories, Imaging - Primer 
Series," 1987, p. 4) 


In Portland, building new storage space would have cost $100 
per square foot.  Recouping 7,200 square feet at $100, 
equals a savings of $720,000. That was the primary cost- 
Justification. Another was being able to instantly find the 
records, which improved overall patient care. (Alter, 1988, 
P- 183) 

Another example of current utilization, USAA (United 
Services Automobile Association) set up a 1,300 workstation 
document-processing system to be shared by over 2,000 
employees in its property and casualty policy-service 
operation. 


According to Charles A. Plesums, manager of image 
systems, the company began processing 2 percent of that 


operation's document workload in mid-July 1988 and will 
expand to 100 percent by early 1989. 

Seven years from now, he added, optical disks will 
store 300 million pages and save 39,000 square feet of 
space. (Lesher, 1988, p. 33) 

This thesis will consider the analysis and design of an 


Optical scanner to Optical storage medium Document 


Processing System. 


B. SCOPE 

This thesis includes an indepth review of current 
Optical Scanner and Optical Storage Medium technologies 
presently available. The purpose of this review to provide 
the reader the different options available for designing an 
Optical Information System. 

Analysis of requirements for implementing an archive 
Optical-disk-based document processing system will be 
conducted. Alternative systems and solutions will be 
addressed and a recommendation will be submitted for 


possible implementation. 


C. METHODOLOGY 
Utilizing the thesis texts presently stored in the 


library as a statistical population, a small sample will be 


used to conduct research on Optical Scanners. Questions to 
be answered concerning Optical Scanners will be 1) What is 
the time required to convert text to data, 2) How accurate 
is converting hard copy text to digital format, and 3) What 
are the digital storage requirements. Review of Optical 
Storage Media that will best match the requirements of an 
Optical Scanner will be addressed.  Presently there is 
thesis research being conducted in the area of Indexing an 
Optical Disk using Hypertext, and Storage requirements using 
Optical Disk. This thesis will primarily emphasize Optical 
Scanners and the initial phase of conversion in an Optical 


information system. 


II. OPTICAL SCANNERS 


A. INTRODUCTION 

The initial phase of converting hard-copy text to 
digitized format in an Optical Information System is 
Scanning the document either by an Optical Character Reader 
or by an Image scanner. 

Today modern scanners can combine the functions of 
reading text and processing image information because they 
contain more, and more complex components and algorithms 
than did earlier scanners. No scanner yet exists that can 
Scan a page and interpret text and graphics in a single 
pass. However, software now exists that provides the option 
of utilizing an image scanner to scan for either text or 
graphics in a single pass and then combining the two to 


produce a digitized copy of the original. 


B. ELEMENTARY CONCEPT OF SCANNERS 

A scanner obtains optical information (about light and 
dark areas on the image) from the original image. Next, the 
electronic converter units translate that optical image 


Mito rmat1on into digital information. A processor unit 


manipulates the digital data according to specified 
instructions, in order to create an output image that can 
be, in some way, different from the original image. Figure 
2 illustrates the conversion from a scanned image to digital 
format for comparison. The reader unit of a scanner combines 
a light source, several mirrors, and a lens. These 
components illuminate the original image and reflect light 
from it. More light is reflected from lighter areas of the 
original than from darker areas. 

A photoconverter converts information about the 
reflected light into an electrical voltage. An analog-to- 
digital converter further changes the electrical (analog 
information into a digital (binary) data format. 

The digital data is passed to an image processor, where 
it can be manipulated to produce the desired output. In the 
image processor, adjustments may be made to the size and 


shape, resolution, and contrast of the output image. 





Figure 2 
Magazine, 


| i | | | Text and Graphics Scanning Techniques 






Light Source 





The cage to be scanned t$ diurre- 
nad Dy a 'Ow-rSQuency ont. 
The dark areas absorb Ine sont 
wre Ine aon areas reñect 1 Thus 
refiectas IGNI is focused and 
asned al a ONOUUIOOS array or 
CCD (Chargea Coupred Oencs) 
tal tansiares areas ol bg ano h 
dark wo banary Gata (Os and 1s). di 












Bit mage of character 
Otgrtal Deta Signats ES is comeared wth 
10011611119001010010 d | known mage. E 
11109010101011100 1100 I i 







100091309101 29192900: 


Despre ere 


If match I$ successful, 
ASCU code ıs sent to 
Ma. 







ANITA 


G27 06005: 


» 


DT ee 





É match is not made, 
scanned character :$ 


Text and Graphics Scanning Techniques 


1986, p. 134) 


(PC 


C. INDEPTH REVIEW OF HOW SCANNERS WORK 
1. The Light Path 

The original image is first illuminated by a light 
source in the reader unit of a modern scanner. Light 
reflected from the original image is passed by mirrors to a 
lens, which focuses the reflected light and passes it from 
the reader unit to the converter units. 

Figure 3 illustrates the path that the light takes 
from the original image to the CCD (Charged Coupled Device), 
or photoconverter. 

The above listed steps of the scanning process are 
essentially the same for both text and image processing. 
However, depending on the scanner's design, either the page 
is moved over a fixed scanning element or the scanning 
element is moved over a fixed page. Most OCR's have a fixed 
Scanning element and most image scanners have a moving 
Scanning element. 

The light source of the scanning element may be a 
laser, or another type of high intensity lamp. For an image 
scanner, the light source is mounted on a carriage, so that 


it moves to illuminate the original image, not all at once, 


but in a systematic manner. The carriage moves in the slow 
Scan direction, illuminating a strip of the original image 
with each movement. While a strip is thus illuminated, the 
fast scan occurs. 
a. Slow Scan 
In the slow scan direction the light source moves 
stepwise to each strip of the original image, where it 
pauses while the fast scan takes place. The distance it 
moves is dependent on the resolution setting. This distance 
corresponds to the height of a pixel in the output image. 
b. Fast Scan 
In the fast scan direction the light source 
pauses for a brief interval. Information from the 


illuminated strip of the original image is read and 


converted into digital data before it is processed. The 
illuminated strip is divided into discrete sections. The 
width of each section is determined by the resolution. This 


width corresponds to the width of a pixel in the output 
image. 

Light reflected from the original image is thus 
divided into discrete areas which are processed separately. 


Each is represented as a pixel in the output image. 


Platen 





Figure 3 Scanned light path to the CCD. (XEROX 7650 reference manual, 
1987) 


A series of mirrors pass the reflected light from 
the original image to the lens. In this way the focal 


length of the reflected image is effectively made longer. 


10 


The longer focal length permits the use of a relatively 
small lens. 

The lens, like the lens in a camera, focuses the 
reflected light from the final mirror onto a specific site 
(corresponding to a pixel) on the surface of the 
photoconverter. 

2. Light Signal to Digital Signal 

The converter units of a scanner contain electronic 
devices which convert, or transform, the information 
reflected from the original image into electronic data. 

As the light reflected from the original image is 
passed to the photocells on the surface of the CCD, they 
convert that optical signal into an electrical signal (a 
a tage) proportional to the "size" of the optical signal. 
The “size" of the optical signal is the amount of reflected 
light. That is, a white area of the original image reflects 
more light, so it generates a greater voltage. 

The electrical signal requires one more 
transformation before its information can be understood by 


the processor. An analog-to-digital converter performs this 


LI 


final transformation, and passes the digital signal to the 
processor. 

From this point on, the data manipulation processes 
for optical character readers and image scanners are very 


dH Terent: 


D. OPTICAL CHARACTER READERS 
1. What is OCR? 

An OCR is defined either as optical character 
readers or as optical character recognition used in the 
process of converting an image of text into computer 
readable form (i.e., ASCII). 

The original concept of an OCR was a device that 
could only digitize characters produced by a typewriter. A 
new acronym, ICR, is being used by some vendors to replace 
OCR. ICR, defined either as Internal Character Recognition 
or Intelligent Character Recognition, includes the 
capability to recognize omni-font characters or otherwise 
known as the many different fonts and characters produced by 
todays computers and printers. For the purpose of this 


thesis, OCR will be used to imply both OCR and ICR. 


1 


OCR is accomplished by analyzing the image of a 
character and then deciding what character the image 
represents. Unfortunately, OCR is not an exact science and 
consequently, any OCR process is inherently imperfect. 
Recognition errors will occur, regardless of the particular 
OCR technology. 

2. OCR Technology 

Once the scanned image is converted into digital 
data, OCR scanners digitize the characters a line at a time 
and then isolate them, character by character, into frames 
ranging from 24 by 40 pixels up to 30 by 50 pixels. The 
individual frames are stored in RAM for character 
recognition processing. 

There are two broad categories of character 
recognition processing commonly used in today's OCR/ICR 
scanners. The first, and perhaps the oldest, is commonly 
called Matrix Matching or Template Matching. The second, a 
more recent development, is referred to as Feature 


Extraction. 


IE 


a. Matrix Matching 
Matrix Matching, in its simplest form, can be 
thought of as comparing the image of an unknown character 
with images of known characters and finding the nearest 


match. Figure 4 illustrates how a scanned image is compared 


to a template. 





- 
Compor of the scamaed 1mooe 
and ihe "B^ wmpiale m 
=cnarsarr not went fied. 





Figure 4 Template Matching. (MICRO User's Guide, 1988, p. 20) 


The very nature of this technique requires a 
complete set of templates for each font the system will 
read. This means that multi-font matrix matching systems 


need considerable memory for the font libraries. 


Another disadvantage associated with matrix 
Matching is its sensitivity to minor variations in fonts. 
Two fonts that look the same to a user may not be recognized 
equally well in matrix matching due to subtle differences in 
character shapes or sizes. On the positive side, matrix 
matching is relatively insensitive to broken characters, 
which occur all too frequently in ordinary documents. 

b. Feature Extraction 

The term Feature Extraction is used in the 
industry to describe any OCR technique other than Matrix 
Matching. As a result, the name does not convey much 
information about how OCR is being done. Of the feature 
extraction techniques in use today, the most popular is 
Topological Feature Anaiysis. 

Topological Feature Analysis involves identifying 
the important features of a character image and, based on 
these features, deciding what character the image 
represents. These features can include vertical strokes, 
horizontal strokes, line endings, closed curves, open 
curves, slanted strokes, intersection of strokes, et cetera. 
Figure 5 illustrates the comparison between a scanned image 


and the primitive features extracted. 


13 


Scanned image of the characier Scanned image of the character 
to be recognized. to be recognized. 





Primusve features extracted from Primitive features extracted from the 
the scanned image: scanned image: 
* Posuxve sloping stroke on left. * 1 curve al the top, open to the left. 
"Negative sloping stroke on nght. °} cunve a the botom. open to the ief. 
* Horizontal crossoar. °] vencal stroke on the lef. 
Figure 5 Feature Matching. (MICRO User's Guide, 1988, p. 
23) 


The nature of this technique makes it relatively 
insensitive to slight variations in character shape and 
size. Another advantage to feature extraction techniques is 
that, in most cases, less memory is required for the font 
libraries. For example, the features of the letter "e" are 


fairly constant for a wide variety of typefaces. A 


16 


disadvantage of feature extraction is its sensitivity to 


broken characters. 


E. IMAGE SCANNERS 

Image scanners work like laser printers in reverse. A 
scanner converts image information into electrical signals 
that can be stored in a computer, whereas a laser printer 
converts the image into charges on the surface of a 
photosensitive drum. 

The electrical signals from the scanner’s CCD (which 
reads an entire row at a time) are converted to numeric 
values and stored in RAM until the entire image is scanned. 
Figure 6 illustrates the digital data captured from a single 
pest Of the letter IT! (one pixel in height) as it is read 
by the scanner. 

This data is stored in RAM like a two-dimensional mosaic 
of dots that represents the original image. This two- 
dimensional mosaic is otherwise known as a bit map. 

Two parameters are very important in image scanning: 
resolution, usually expressed in pixels per inch (PPI), and 
the number of levels of grayscale information captured for 


each pixel. 


EN 


Original Image 






Fast Scan 1 
(strip illuminated 7 : 
by lamp) | ——— 


White 
Reflectance i 
(Opticai information) 
Black 
White 
Voltage 
(Analog information) 
Black 
225 
Gray Imace 
(Digital information) 
128 


eT 


Figure 6 Conversion of original image to digital information. (XEROX 7650 
Reference Manual, 1987) 


1. Resolution 

Resolution is defined as the number of pixels read 
or displayed per inch (PPI), both horizontally and 
vertically. 

Most image scanners have resolutions up to 300 PPI, 
meaning that a single scanned line across the width of an 8- 

by 10-inch image contains 2,400 pixels. And if the 
lengthwise resolution is also 300 PPI, then there will be 
3000 pixels in a single column the length of the image. In 
all, it takes about 7.2 million pixels to represent an 8- by 
10-inch image. An image that size requires approximately 1 
megabyte of storage. 

Increasing the resolution allows more detail (finer 
lines or sharper changes in gray in an image) to be resolved 
and improve the appearance of a scanned image. Figure 7 
illustrates the difference in appearance of the letter "a" 
at different resolutions. 

2. Levels of Grayscale 

If you think of a typical fine grain photograph, the 

number of shades of gray that can be reproduced are 


essentially infinite, at least as far as the human eye can 


19 


see. However, in the digital world there is a limit to the 
number of shades of gray. Everything must be represented in 
discrete steps and it takes more information (bits of data) 
to represent more steps. Thus 4 bits of data are required 
to represent 16 levels of gray per pixel, 6 bits to 


represent 64 levels, and 8 bits to represent 256 levels. 


Produced at 300 poi. E Reproduced at75 ppi. 


i I j I 1 1 1 ' i 1 


A 
D 
D 
> 
Q 
a 
c 
i 
w 
rt 
— 
un 
e 
O 
2 













| 
| 


ILI 
A 
| 





A A 








Figure 7 Comparison of different PPI settings. (XEROX 7650 Reference manual, 
1987) 


The higher the resolution of an image and the higher 


the number of levels of grayscale the image contains, the 


ct 


higher the quality of the image. Unfortunately, the size of 
image files increases with respect to the square of the 
resolution and linearly with respect to the number of bits 
of grayscale information. One scanned image page could 


easily require several Mbytes of storage as illustrated by 


20 


Table I. Additionally, processing and retrieval time 


increases as the size of the stored data increases. 


F. COMPRESSION/DECOMPRESSION OF A SCANNED IMAGE 

Because of the extremely large memory requirements of 
scanned images, it was necessary to develop 
compression/decompression techniques to increase the number 
of pages that could be stored on a particular device. 


Table I FILE SIZE (IN MEGABYTES) FOR AN 8 1/2 X 11 INCH SCANNED 
IMAGE. (DESKTOP PUBLISHING, FALL 1988, P. 29) 
A A ES Oe SS ee omm ee ER 


Resolution Grayscale Levels 
of scanned image 
(ppi) 2 16 64 256 
300 1.1 4.2 6.3 8.4 
400 1.9 uoo TS O 
600 4.2 168 253 336 


Until compression/decompression chips became available, 
image compression was performed in software, taking at least 
30 seconds for a typical page. Now with image 
compression/decompression processor chips, such operations 


take only a few seconds. 


oil 


Most image compression processors are based on the CCITT 
group 3 and 4 standards, developed for use in facsimile 
transmission. These standards are based on a combination of 
two image compression algorithms, known as Modified Huffman 
(MH) and Modified Read (MR) encoding. (Matlin, 1988, p. 75) 

l. Modified Huffman Encoding 

Also known as one-dimensional encoding, MH works on 
an image one horizontal line at a time. Each run length, or 
continuous string, of black or white pixels is given a code 
base on the probability of that particular length. To 
achieve image compression, the codes for the most probable 
run lengths must be shorter than the run lengths themselves. 

The CCITT Group 3 standard employs MH encoding. In 
this algorithm, the codes representing the document run 
lengths are selected from one of two 64-element tables 
representing black and white run lengths of O to 63 pixels. 
These tables were derived from statistics based on eight 
standard documents, which are available from the CCITT. For 
longer run lengths, make-up code tables exist for run 


lengths in multiples of 64 pixels. 


22 


As an example, if an entire run length of 8 1/2 
inches is white, and the document is scanned at 300 PPI, a 
white run of 2544 pixels is indicated (rounded off to the 
next-lowest byte). This run length can be represented by a 
make-up code for 2496 pixels (000000011110), a white run- 
length code for 48 pixels (00001011) and an end-of-line code 
(000000000001). Since only 32 bits are required to 
represent the original 2544 pixels, a compression ratio of 
79.5:1 is achieved for this line. Of course, this is a 
particularly easy line to compress. 

2. Modified Read Encoding 

Also known as two-dimensional encoding, MR coding 
takes advantage of the vertical correlation between adjacent 
lines within a document. It has been estimated that 50 
percent of all transitions from white to black, or vice 
versa, occur directly below a transition on the previous 
line. To encode using the MR algorithm, the relationship 
between a transition on the current line and the previous 
line is determined. If the current line transition is 
within three pixels of a transition on the previous line, a 


vertical mode is indicated. This case is represented by a 


23 


short code indicating vertical mode, and another code 
indicating the relative distance between the current line 
transition and the transition above it. If the distance 
between transitions is more than three pixels, the pixel 
distance is encoded using the appropriate MH code. This is 
known as horizontal mode. A third technique, known as pass 
mode, is used to realign the transition pointers between the 


coding and reference lines. 


24 


III. OPTICAL STORAGE MEDIA 


A. INTRODUCTION 

Before optical storage, it was difficult to have video, 
audio, image, and text data on-line because of the large 
memory required to store the various types of data. With 
the advent of optical storage media, different forms of 
information can be digitized, integrated, and displayed as a 
single form of information. 

The extremely high-density recording capability of 
optical devices enables one 5 1/4 inch optical disk drive to 
store 654 million bytes (654 megabytes) of information. 
That's equivalent to the amount of data contained on 1800 
360K floppy disks or 33 (20 megabyte) hard disks or 260,000 
pages of text. A single 12-inch optical platter can store 
as much as four GB’s (gigabytes) of information. Four GB's 
or four billion bytes is equal to the data stored in 160 
file cabinets or the amount of data stored on 120 2,400-foot 
magnetic tapes (Dukeman, 1988, p.82). Larger discs are 
available containing even larger amounts of data, for 
example Eastman Kodak Company recently introduced an optical 


system that can store more than a terabyte of information. 


29 


One terabyte is equal to a trillion bytes. The Kodak system 
6800 uses 14 inch optical discs that store 6.8 gigabytes 
(billion bytes) each of randomly accessible information. 
The automated library unit can accommodate as many as 150 
discs (and 150 times 6.8 billion yields a figure in excess 
of one trillion) ("CD-ROMS: The Laser's Edge in Data 
Storage", 987999 555 

Compared with magnetic disks and tapes, optical media is 
almost indestructible. Optical disks can be mailed without 
Special precautions, and taken through X-ray machines and 
airport scanning devices. Optically stored data is 
unaffected by the environment or magnetic fields. Some 
optical media last for 30 to 100 years, but magnetic media 
has an average life expectancy of only three to five years. 

Optical disks are removable and thus the data can be 
securely stored. Optical disks don't stretch over time as 
do magnetic tapes. Most optical media can't be altered, and 
optical media is less expensive per megabyte of storage. 
Data access time for optical disks is still slower than 
magnetic disks, but as the product matures and proliferates 


in its target markets of data distribution, publishing, 


26 


database archiving, and imaging data, access time will 
improve.  Paper-intensive environments should trade 
increased access times for large capacity, unattended backup 
capability and high-volume storage of integrated data, text 


and images. (Levine, R., 1988, p. 50) 


B. ELEMENTARY CONCEPT OF OPTICAL DISKS 

Information is recorded on a plastic-coated glass disc 
in the form of pits and lands. Pits are indented 0.12 
micrometer deep and 0.6 micrometer wide into the surface of 
the disc. Flats measure between 0.9 and 3.3 micrometers in 
length. 

Data on the CD-ROM disk is arranged in a spiral pattern, 
radiating from the center toward the outer edge. A space of 
1.6 micrometers separates the lines of data in the spiral. 
This configuration yields an effective track density of 
16,000 tracks per inch. In contrast, floppy disks have a 
density of 96 tracks per inch. 

Before data is inscribed on the disc, it must first be 
translated into a special dialect of the binary channel code 
that is used to transfer data between more familiar magnetic 


formats and computer devices. In magnetic tape formats, the 


Du 


ones and zeros represent digital information. CD-ROM 
channel code assigns ones to mark movement from a land to a 
pit or a pit to a land, zeros mark a continuation of lands 
or pits in series. 

A low power laser is used to read data from the disc 
surface. Light rays are aimed by an optical head over the 
information track on the spinning disc, and the amount of 
light reflected back to the optical head indicates the 
presence of a flat, which reflects more light, or a pit, 
which reflects less light. The series of flats and pits of 
digital data unscrambles the code into data the computer can 
use. ("CD-ROMs: The Laser's Edge in Data Storage", 1987, p. 


52) 


C. CD-ROM: COMPACT DISK-READ ONLY MEMORY 

CD-ROM offers prerecorded optical storage. It’s a read- 
only device; you can read the information on the disk, but 
it can’t be altered. Used to distribute a common database 
that doesn’t have to be updated constantly to multiple 
division, departments or branch offices, it ensures that the 
data is protected against tampering or accidental erasing 


and is ideal for archival purposes. 


28 


Most systems of this type store 600 MB's (megabytes) of 
data on a 4.7-inch CD-ROM disk and drive. Half-height 5 
1/4-inch drives also are available. CD-ROM manufacturers 
have embraced the High Sierra Group and ISO 9600 standard 
for file organization, thus the 4.7-inch drive has become 
Ene industry standard. (Levine, R., 1987, p. 50) 

CD-ROM is the most economical optical media for mass 
distribution of databases. The cost of preparing a master 
disk is relatively expensive, but after mass production, the 
cost per copy can be as low as $2 in a large-scale 
suSribution. 

For CD-ROM, optical disks are mass produced regardless 
of whether the encoded data represent video, audio or text. 
Once the information has been transcribed into digital 
format and the special cue codes have been added, all the 
data is transferred to a master tape. 

Once the information is recorded on the plastic-coated 
glass disc, the glass disc is used to create a metalized 
master disc. The surface of the master is transferred onto 
nickel shells to form negatives and positives from which 


'stamper” copies are made for mass replication. The stamper 


29 


is used to transfer the information onto nickel shells Tekan 


reflective aluminum and then covered with lacquer. 


D. WORM: WRITE-ONCE, READ-MANY 

These optical storage devices permit one-time writing 
but unlimited reading of data and images. Although you 
can't overwrite or erase previously stored data, you can 
update it by writing new information into a file at another 
location on the disk. The new file then is linked to the 
original file through software and is retrieved in its 
place. This operation is transparent to the user. 

WORM optical technology consists of a high-intensity 
laser beam that heats and permanently changes the surface of 
the disk as it writes and stores information on the disk. 
The writing process, which varies from vendor to vendor, 
ultimately results in a change in reflectivity of the 
information layer of the disk. Figure 8 illustrates how a 
WORM drive works. 

Most WORM drives use a glass- or plastic-based substrate 
to enclose a sensitive recording layer. Eastman Kodak Co. 
uses an aluminum substrate in its 14-inch Optical Disk 


System 6800. 


30 









X A A - - 

I: EU Works E ES T s.. 
E A E Tae LS Media ^  - —." nn 
E uo LAIT LL Mo ee E ee A 
y TE ae earam 2° Mam = -. JA -a y ne + ee A Io mo - RE "s 
2 age + * a 3 en ` Ms s = ta 
E RN ON 2. _ Objective Lens EA E 
"Te WORM uses an vegraa ve : a 


num cS sutwiar to Mak used by 5 

* expensive nard Oasx systems, = 7 
The fine cosisorung motor attows — (| ^ 272 — jj Luzui---.Reaeu lent ^| —[4»«407..2——?— 7 
direct access to a numoer ot . - 


a. > me dent CONG we be ar wees, Sum = n t 


... 
7 4 

- We > o E * 
- = -— « 0 Os oe 

... OS 

z IR re M 5 
. s 
m. .. r .... a . e 

RS” TGR Ac 


ae 


se 
E 

m o... "ro 
-— 6 e. - - 


TA x “Coarse Tracing Motor | 


Figure 8 How a WORM drive works. (PC Magazine, June 1987) 


Presently, three recording technologies are used for 
WORM optical disks; ablative, vesicular and phase-change. 
1. Ablative Recording 
Ablative technology stands out as the most common 
method of writing data. This technology, also known as pit- 
forming, burns a hole in the active layer of the media. 
2. Vesicular Recording 
Vesicular technology, also known as bubble-forming, 
heats the media until it melts and forms a bubble, or 


explosion, of the polymers on the active layer of the media. 


EE 


3. Phase-Change Recording 
Phase-change technology, actually produces a change 


in the media from a crystalline to an amorphous state. 


E. MAGNETO-OPTICAL DISKS: ERASABLE OPTICAL STORAGE 

Magneto-Optical disks provide the same capability of 
Storing and retrieving data as present magnetic drives do, 
but with the storage capability 12 to 50 times the amount of 
data currently packed on magnetic hard disk drives. 

Magneto-Optical disks drives use a combination of 
technologies to store and retrieve information. They rely 
upon materials whose particles can be magnetically oriented 
either up or down but whose orientation can't be changed 
easily at normal temperatures. 

Storing information on the disk is performed by a strong 
laser beam, as illustrated in Figure 9, which heats a 
microscopic spot in a multi-layered material sealed in the 
rotating disk. When the temperature of the magneto-optical 
layer reaches a certain point, its magnetic orientation can 
be changed easily by a magnetic field in the drive. 

After the laser beam is removed, the exposed disk region 


retains its magnetized orientation. 


eZ 


7 


/: 
x 


all "mij 


Wn 


Writing to the disk E 


pM uA, : CS 
rapa IRL Sea caper 
P avi tita to: Heatod particion 035: Ea 
corr > over Charge orwntati 
bi edit 


tae 


- aes E 
any eae EE retail 
orar AE SAL oS, VET ae 
DA 22, 2 EPS ta A 
X T 1552 E w Ve oot CI 
o Pisses SA 
LM. Inn 
e 
"au ng 


FRENOS ron ti Pipe 
(eia 


4 f 
P EED a ET SL 
POS ETI tonne ee 
eto enc A: mo 


a T yt 
EPFL Gor s 
ERR ias aE Te SIRs 


one direction. © 
A WTS OES regen 


"tx 


* direction of zetlected beam-- 


D, 


^ EAE NS sd (desi 3 
‘ “ > m a " 
4 ~~ 

hu 


oe, eam is reflected’: de 7 
= wget 
o T, SiN ends as “one”, cs 





Figure 9 Magneto-Optical disk concept. (Infoworld, 1988) 


33 


To record information, a first pass is made with the 
laser and magnetic field to erase an entire section of the 
recording surface by orienting all the spots the same way, 
to represent zeros. Then, a second pass is made with the 
magnetic field reversed, but this time the laser heats only 
spots to be changed from zero to one. 

To read the information later, as illustrated in Figure 
8, a weaker, polarized laser beam is shone at the spot. 
Depending on the direction of the magnetization of the 
recording layer, the polarization of the beam rotates 180 
degrees, a phenomenon known as the Kerr effect. 

After striking the surface, the polarized beam is 
reflected back to a photodetector, which reads the 
variations. With the stronger beam, the information can 
later be "erased" by again heating the spot and altering the 


magnetic orientation. 


F. SUMMARY 

When designing an archival information system, the 
optical media of choice is WORM. Cost of producing the 
master disk would be prohibitive for CD-ROM unless there was 


a distribution base to make it cost effective.  Magneto- 


34 


optical is still being development, but does provide an 
alternative to WORM if there is a need to erase the original 
disk. However, magneto-optical disks are more expensive 
than WORM disks, leaving WORM as the economic medium of 


choice for archival purposes. 


39 


IV. OPTICAL INFORMATION PROCESSING 


Optical information processing is a relatively new 
technology, utilizing optical storage media to store the 


data created by a document data processing system. 


A. DOCUMENT DATA PROCESSING 

What is Document Data Processing? Document Data 
Processing is the procedure of converting information stored 
on paper to digitized format. Document data processing 
includes the ability to electronically store, retrieve and 
reproduce the original information contained on paper. 

Document data processing systems in the past have used 
optical character readers to convert paper information to 
electronic format: Microfiche or magnetic storage devices 
were used to store the electronically converted data. 

In the past, the high expense and relative low capacity 
of magnetic media have precluded its use for storing 
archival quantities of documents in other than character 
coded format: (Kapoor, 1988, p. 28) 

Before the discovery of optical disk, it was impractical 


to maintain images on-line because of the large memory 


36 


requirements of storing a single page. (Grigsby, 1988, p. 
62) 

With the advancements in optical storage media and the 
techniques to compress images into manageable sizes, 
technology exists today to design a document data processing 
system that will merge and manage diverse forms of 
information, including image, text, alphanumeric data and 


voice. 


B. OPTICAL INFORMATION SYSTEM 

Optical information processing systems provide both an 
image and a data processing solution. These digital systems 
utilizing optical storage media to store, and retrieve are 
the missing link in the integration of paper documents, 
microfilm, computer data, and word processing text. This 
technology provides solutions not previously available to 
Solve information access and distribution requirements 
associated with a total information transaction. (Grigsby, 
Mo, p. 60) 

Optical information systems are an idealistic 


alternative to document data processing systems. Optical 


3 


disks not only store images and data, but also the retrieval 


software for index data management. 


Source Optical Storage Retrieval and Printing 
Document Decompression 


Scanning and 
Compression 





Figure 10 Optical Information System. 


A basic stand alone PC-based optical information 
processing system consists of an image scanner to create 
digitized images of documents, workstations utilizing a bit- 
mapped high resolution monitor to display the images, 
magnetic hard disks to store indexed information and act as 
a buffer before storing the digitized image, optical disks 
to store and retrieve the images and a laser printer to 
reproduce the image. 

These stand alone systems are modular and can be 


expanded into large, organization wide, document image and 


38 


data management systems with multiple workstations and 
optical disk storage libraries. 

When the optical storage media is connected to a network 
several people can review the stored document 
simultaneously, eliminating the delays that result from 
passing a paper file from person to person. This process 
allows the actual paper file to be stored in a low cost, 
secure area. Paper files, which are subject to theft and 
accidental loss, often must be sent out for review. Each 
journey risks the integrity of the file and costs additional 
time and expense. 

An optical storage system may include jukeboxes. 

Optical jukeboxes use robotics to mount and dismount a large 
number of optical disks. A Jukebox may contain as many as 
95 disks and up to five separate drives, yielding quick 
access to over three million images. When an image is 
requested, the correct optical disk is robotically selected 


and mounted and the desired image displayed in seconds. 


C. BENEFITS OF OPTICAL INFORMATION PROCESSING 
A primary reason for using computer paper, computer 


output microfiche, and magnetic tape archival storage is low 


39 


cost. Until now, it has been too expensive to keep massive 
amounts of data "on-line". Optical information systems 
provide on-line desktop delivery of current, as well as 
historical, computer information. (Grigsby, 1988, p. 62) 
1. Huge Capacity/Space Savings 

Dozens of comparisons have been made to dramatize 
the optical disk systems ability to compress volumes of 
paper onto a disk. This ability provides high-density on- 
line storage at a comparatively low cost as demonstrated by 
the following table: 


Table II STORAGE CAPACITY AND COST COMPARISON OF DIFFERENT 
STORAGE MEDIA. (MICROSOFT PRESS, 1986) 





Floppy Hard Magnetic Large 
MEDIA Disk Disk Tape CD-ROM WORM Optical 
Capacity 1,000 
(in MB) .36-1.2 5-50 30-300 540-680 200-300 -4,000 
Cost 
per MB 1093.59 63.63 54.64 2.48 17.40 21.41 





2. Speed of Retrieval 
"In à current environment, even in a hurry, it may 
take 10 minutes to retrieve a paper document, copy it, add 


notes, and fax it to a remote office," says Michael Florio, 


40 


vice president Document Technology, INC. (Dukeman, 1988, p. 
82). With an optical information processing system, the 
process would take only 30 seconds, drastically reducing the 
clerical functions of research, filing and hard-copy 
reproduction. Another example: with an optical disk jukebox 
storing 280 gigabytes, a document can be accessed in 7 to 10 
seconds (Dukeman, 1988, p.82). 
3. Shared Access/Remote Availability 
Information on paper can be in only one place at a 
time, or it can be copied and multiplied beyond control. In 
an optical information system only one copy of an image 
exists, but users have access on an "as needed" basis. 
4. File Integrity 
File integrity is another significant reason that 
automation of paper documents is important. Many documents 
are simply lost or not available due to misfiling or out-of- 
file situations. Using WORM optical disk the document 
cannot be misplaced or altered once it has been written. 
Additionally, when connected to a network, several people 


can review the stored document simultaneously, eliminating 


41 


the delays that result from passing a paper file from person 
to person. 
5. Archival Life 
Storage life of optical media is estimated to be 
more than 30 years, it can be duplicated on new media at 
more frequent intervals. The life expectancy of optical 
disks far surpasses the estimated life of 5 to 10 years for 
magnetic storage. At the 1988 AIIM show in Chicago, Sony 
Corporation announced that accelerated tests showed that 
Sony WORM media is capable of a one-hundred-year life 
(Dukeman, 1988, p. 84). 
6. Cross Reference Indexing 
Once identified by a multiple level cross reference 
index, images can be retrieved by a number of desired 
fields. Indexing multiple key fields allows greater 
flexibility for accessing data. 
7. No Head Crashes 
Unlike magnetic disks, optical disks do not 


experience head crashes. 


42 


8. Distribution 
Coupling optical systems with a communication 
device, users can send and receive documents in seconds. By 
adding a facsimile capability to a system, this enables the 


system to send a document image to virtually anywhere there 


is another fax machine. 


43 


V. METHODOLOGY AND DATA 


A. INTRODUCTION 

In search of hardware to evaluate for an Optical 
information system, the researcher used three sources to 
acquire data from: 1) In-place operational systems, such as 
the EDS Deers Enrollment Processing Center which uses a 3M 
Docutron 9000 optical information system and the Defense 
Language Institute Graphics Department which uses a Kurzweil 
4000 optical character recognition system, both in Monterey 
Ca.; 2) Companies that specialize in Optical information 
system integration, such as TAB Products Co. Palo Alto, Ca., 
Anamet Laboratories, Inc. Hayward, Ca., LaserData Inc., 
Lowell, Ma., and Wang Laboratories, Inc., Lowell, Ma.; and 
3) Vendors that sell either OCR scanners or Image scanners, 
such as Xerox or Western Office Supply in Santa Clara. 

Prior to answering the research questions, a 
summarization of findings of both optical character readers 


and image scanners will be discussed. 


44 


-— 


B. OPTICAL CHARACTER READER EVALUATION 

With the purpose of fulfilling the need of converting 
thesis documents into digital format, several OCRs, 
including the latest technology advanced OCRs available on 
the market, were used for evaluation. 

Two top of the line OCR scanners were evaluated. The 
Kurzweil 5000 was demonstrated by Western Office Supplies of 
Santa Clara, Ca. and the Calera CDP 3000 was demonstrated by 
Anament Laboratories, Inc. of Hayward, Ca. Both have self 
contained processors and are designated as Omni-font 
readers. Both have Automatic Document Feeders (ADF) capable 
of processing 50 pages at a time. With their built-in 
processors, both were able to background scan, while 
permitting the PC to perform other functions. 

Additionally, True Scan, also a product of Calera, was 
demonstrated by Western Office Supplies. True Scan uses an 
image scanner, an extended RAM board added to a PC and 
software to perform OCR/ICR. True Scan does not have the 
capability to perform background scanning. 

Optical character readers were originally designed to 


read text only. Current models advertised the ability to 


45 


read a page and distinguish between graphics and text. Ina 
sense this was true, distinguishing several fonts of text 
and ignoring any form of graphics. Selection to scan either 
in an image mode or text mode had to be made prior to 
scanning. 

If à single page contained both text and graphics, the 
page would need to be scanned once for text and once for 
graphics and then to obtain a digitized copy of the original 
page, the text and the bit map image of graphics would need 
to be merged. 

This process may work well in an office environment 
where only a few documents a day might be digitized. But 
when converting large document databases, the time to 
preview each page prior to scanning, scan the page twice if 
needed, and merge the text with graphics would be too time 
intensive to be practical for a large conversion project. 

Optical character recognition is still not an exact 
science. It is the opinion of the researcher that the 
recognition capability of todays models is vastly improved 
over earlier models, but there were still numerous errors 


made by all models previewed. 


46 


Newer models can now distinguish text printed in columns 
and around graphics but still have great difficulty with 
formats. A full page of text from a thesis averaged 2 to 3 
character recognition errors. Recognition errors were easy 
to correct, but it was the experience of the researcher, 
that the time required to correct format errors was more 
intensive. 

Appendix A contains pages from the original thesis used 
for research. Appendix B contains the unedited results 
using the same pages in Appendix A and scanned with a 
Kurzweil 5000. The Kurzweil 5000 did an excellent job of 
reading text, such as the small print on page 1 of the 
thesis document DD Form 1473. But it illustrates how time 
intensive it would be to correct the recognition errors and 
to reformat the page in an acceptable form for permanent 


storage. 


C. IMAGE SCANNER EVALUATION 
Several image scanner models were reviewed. The only 
noticeable difference between the various models as 


illustrated by Table III was the amount of time it took to 


47 


scan a standard 8 1/2 x 11 page. The resolution and 
grayscale quality was comparable among all models. 

The Fujitsu Model M3094E was used by TAB Products and 
Anamet Laboratories in the integration of their optical 
information systems. The Fujitsu model was rated fastest 
among several commercially available models, averaging 7 
seconds per page at 200 PPI resolution. 


Table III TIME COMPARISON REQUIRED TO SCAN A SINGLE PAGE. 


SCANNER TIME TO SCAN A SINGLE PAGE 
(resolution = 200 PPI) 


Hybrid < 1 sec 
(3M Docutron 9000 system) 

Fujitsu M3094E 7 sec 

Microtek MS 300A 24 sec 

XEROX 7650 31 sec 


D. RESEARCH QUESTIONS 
Utilizing an original copy of a thesis, and the oprime 
character recognition and image scanners discussed above, 


the following questions were addressed: 


48 


1. What is the time required to convert text to data? 
a. Optical Character Recognition scanners 

Time required to convert text to data depended 
upon the processor of the individual scanner selected. The 
scanners with the greater built-in processing capability, 
such as the Kurzweil 5000 or the Calera CDP 3000 averaged 30 
seconds or less per page. Using the True Scan board 
attached to an Image scanner, the average time was 60 
Seconds per page. The Kurzweil 4000, using 1985 technology 
averaged 90 seconds per page. 

Times mentioned do not include the time required 
to correct the errors, nor the time it would take to rescan 
the page as graphics and attempt to combine the two. Both 
error correction and combining graphics could take up to an 
additional 5 minutes per page dependent upon the number of 
errors and format of the graphics. 

b. Image Scanners 

Time required to convert an image to digital 
format was dependent upon the individual processor tested 
and the resolution selected. 

Times for the different processors ranged from 


less than 1 second per page (scanning both sides) for the 


49 


hybrid scanner designed for the 3M Docutron 9000 system to 
31 sec per page for the XEROX 7650 image scanner. 
Comparisons were based on scanning at 200 PPI. (See Table 
IIL) 

Decreasing or increasing the resolution had the 
same effect on time to scan. For the XEROX 7650 decreasing 
the resolution to 75 PPI decreased the time to scan to 14 
seconds. Increasing to 300 PPI required 38 seconds and 
increasing to 400 PPI required 154 seconds. The same page 
was used for the above time analysis, which demonstrates the 
increased time required when increasing scanning resolution. 

2. How accurate is converting to digital format? 

If the document was strictly text the accuracy rate 
was quite high for OCRs scanners. They rarely had more than 
2 or 3 errors a page, but if there was any form of graphics 
such as figures or tables, the error rate went up 
drast rcally: 

For Image scanners, the accuracy is a matter of 
resolution. For a resolution of 75 or 100 PPI, the qua Miti 
was good but generally not as good as the original, with 


some of the smaller details harder to read. 200 PPI 


50 


resolution was just as good or better than the original. 

300 or 400 PPI resolution produced a product that was much 

better than the original. Appendix C demonstrates the 

quality difference between pages scanned at 200 and 300 PPI. 
3. What are the digital storage requirements? 

Text scanned by optical recognition readers required 
very little storage space as compared to image scanned text. 
The entire 8 pages read by the Kurzweil 5000 in Appendix B, 
Bpeuurred only 15,171 bytes to store. 

Image scanning requires vastly larger amounts of 
memory. An uncompressed page scanned at 300 PPI requires an 
approximate 1 megabyte of memory. A compressed page at 300 
PPI still requires approximately 40,000 bytes of storage 
Space. Table IV provides a sample of the compressed file 
sizes required for individual pages scanned in Appendix A. 


Table IV FILE SIZES REQUIRED FOR INDIVIDUALLY SCANNED 


PAGES. 
AP eee 
Page # 200 PPI 300 PPI 
Cover 13,717 20,588 
1 50,405 76,600 
4 24,207 36.361 
17 15,526 23,834 


o 


VI. ANALYSIS 


A. THESIS DOCUMENT STORAGE REQUIREMENTS 

The area in the Naval Postgraduate School's Knox library 
where the thesis documents are stored, is referred to as the 
thesis cage. It is so named because of the wire walls that 
enclose the area to control access. 

The thesis cage comprises an area of 23 feet by 21 feet 
and a ceiling height of 8 feet. This small area, utilizing 
compact shelving, contains approximately 23,500 theses. For 
each thesis title stored, there is one hard bound and one 
soft bound document. Therefore, there are approximately 
11,750 original thesis documents dating back to the early 
50's. Each quarter there is an additional 200 to 250 new 
theses produced. Adding approximately 1000 new thesis 
documents annually. 

Selecting 20 theses documents at random, the average 
Size of a thesis was 108.3 pages in length, of which 27.9 
pages were graphs or charts and 7.4 pages were pictures. It 
is important to note that each thesis contained 
approximately 25 percent graphics of some form. For this 


reason OCR was not considered a viable alternate for 


92 


converting the thesis documents and therefore will not be 
considered in the continuation of the analysis. 

For a four month period from September to December 1988, 
the thesis cage was used, on the average, 4.2 times per day. 
(The above average did not include Sundays and school breaks 
between quarters when the library was not being utilized to 
its fullest capacity). With this in mind, one optical disk 
processing work station would be more than adequate to 
fulfill the needs for search and retrieval. 

Available on the market today are 1.6 gigabyte per side 
12 inch optical WORM disks, with two gigabytes per side and 
greater being evaluated for market introduction. 

Using 12 inch optical disks and a recommended 200 PPI 
scanning resolution to replace the 11,750 thesis documents, 
it would take approximately 33 gigabytes or 10.3 optical 
disks to store all theses currently in the cage. 1000 new 
thesis documents would require 2.8 gigabytes or 


approximately one new 12 inch disk each year. 


953 


B. COST ANALYSIS 
1. Optical information system cost analysis. 

A system that fulfills the requirement for an 
optical information system for the library is the TAB Laser- 
Optic Filing System 2000. A single desk measuring eight 
feet in length contains the complete system. The system 
integrates the following components; one image scanner, one 
high resolution monitor, one cpu and hard disk, one 12 inch 
optical disk drive, and a laser printer. 

The cost of the TAB Laser-Optic Filing System 2000 
with the 12 optical disk drive is $69,950. Each 12 inch 
optical disk costs $575. Initial purchase would require 11 
optical disks to store the entire thesis library, plus two 
additional disks to cover the first two years of expected 
additional thesis documents. Technology is expected to 
increase storage capacity of the 12 inch optical disk, so 
more than two years in advance purchase is not recommended. 
The cost of purchasing the 13 optical disks is $7475. 
Therefore, the initial hardware/software cost to implement 


the system is $77,125. 


54 


2. Hard-copy to digital conversion cost analysis. 

Conversion cost is determined by time and cost for 
an individual to complete the conversion. 

Conversion time consists of 1) the time to prepare 
the document for scanning, i.e., such as removing staples, 
2) the time to scan the document, 3) the time to index the 
document for storage on the optical disk, and 4) the time 
required to actually store the document on the disk. 

The image scanner of the TAB Laser-Optic Filing 
System 2000 can scan a page in an average 7 sec at 200 PPI. 
For an average thesis document of 108.3 pages, it would take 
approximately 12.6 minutes for each document. Add 
approximately 5 minutes to prepare each thesis document, 2 
minutes to index each thesis document and less than a minute 
to store each thesis document to an optical disk, it would 
take a total of approximately 20 minutes to prepare, scan, 
index and store each thesis document. 

Scanning 1 thesis document every 20 minutes equals 
24 thesis documents scanned in an eight hour day.  Assigning 
one individual full time, it would take 490 days or 98 work 


weeks to convert the entire library of 11,750 thesis 


39 


documents. Assuming an individual hired as a GS-3 to 
perform the conversion, with an approximate annual salary 
and benefits worth $13,800, it would require a total of 
$26,000 to complete the initial task. 

Additional conversion of 200 - 250 new thesis 
documents each quarter would require an individual for ten 
working days per quarter. Assuming the same salary 
requirements, it would cost an approximate $2120 per year 
for converting new documents. 

3. Summary of Cost Analysis. 

The total cost to initially implement an optical 
information system is $103,125, the cost of the system and 
disks =- $77,125, plus the initial cost of conversa 
currently stored documents, $26,000. Then to continue to 
convert documents as they arrive, would cost an additional 


$4,240 for the first two years. 


56 


VII. CONCLUSIONS, RECOMMENDATIONS 


A. CONCLUSIONS 

The design and implementation of a hard-copy to digital 
format optical information system has the potential of 
solving storage capacity problems not only for the NPS Knox 
library, but also for other document archive facilities both 
in the Department of Defense and other governmental and 
Civilian agencies. 

The technology to convert hard-copy documents into 
digital format is readily available today. The image 
Scanning optical information system converts, stores and 
retrieves documents in a matter of seconds. 

So the issue to determine whether or not to convert 
hard-copy technical documents into digital format is 
Strictly cost. The cost of hardware and software to 
implement the system, the initial cost of converting 
currently stored documents, the cost to convert documents as 
they arrive, and finally, the cost of maintaining the system 
once it's on-line. All these costs must then be traded off 


for benefits in the form of space made available for other 


ST 


uses, faster search and retrieval times, and an overall 
increase in use due to easier accessibility. 
When reviewing the performance of imaging systems in 
government, one can develop a cost justification based 
on an agency's savings in information processing costs 
and storing of paper. But perhaps the true bottom line 
should be measured in terms of service delivered to the 
public. (Levy, IU me, 
B. RECOMMENDATIONS 
During the researcher's analysis of the thesis cage, it 
was noted that two documents for each thesis existed. One 
hard bound copy and one soft bound copy. To save space 
immediately, the researcher recommends removing the soft 
bound documents for storage elsewhere. This would free 50 
percent of the space in the thesis cage. The hard bound 
thesis documents could then be treated as any other text in 
the library, being recalled if another individual needs to 
review a checked out thesis. 
At the same time or in the future, if the decision is 
made to convert to an optical information system, the soft 


bound thesis documents could be used for scanning without 


interrupting the current storage system. 


58 


If the decision to save much needed floor space in the 
Mary or build now, purchasing an imaging optical 
information system is a highly recommended alternative. Not 
only to convert thesis documents, but the system could be 
expanded to convert other texts in the library as well. 

To reduce the cost of implementing an optical 
information system, the researcher recommends a follow-on 
thesis, researching and building an in-house imaging optical 
information system. 

A question for consideration for a follow-on thesis, 
would be the feasibility of scanning graphics and combining 
with text during thesis preparation. This would reduce the 
cut and paste that is currently done, reduce the overall 
storage requirements of a thesis, and eliminate the need to 
scan future theses. The final digitized copy of the thesis 
document could then be forwarded to the library and 


distributed to other government agencies at a lesser cost. 


SIS 


APPENDIX A 


ORIGINAL PAGES USED FOR SCANNING RESEARCH 


Appendix A contains the original pages from the sample 


thesis document used for scanning research. 


60 


NAVAL POSTGRADUATE SCHOOL 


Monterey, Galifornia 





PTE 


OP ICAC LASER TECHNOLOGT, SPECITMICALLT 
CD-ROM, AND ITS APPLICATION TO THE STORAGE 
AND RETRIEVAL OF INFORMATION 


by 
Dave anda 


June 1987 


Thesis Advisor: 





Aporovediikor pln ee release: distribution is unlimited 


6l 


INCLASS IETS 
yay CLASSIFICATION OF TE Fale 


REPORT DOCUMENTATION PAGE | 


AEPORT SECURITY cuASSiFICATION 19. AESTRICTIVE MARKINGS 
t 















s 





uem OG 


SECURITY CLASSIFICATION AUTHORITY 












3 OISTAIBUTION/ AvaiLAsIUTY OF AEPOAT 


Ot£CLASSIFICATION ! OOWNGAAOING ¿CHEDULE 





Aoproved for public release; 


^4 e de Sia on te “Ten 5$ 3 ms = 2/1 


ERFORMING ORGANIZATION REPORT NUMBER(S) $ MONITORING ORGANIZATION REPORT NUMBÉA(S) 


NAME OF PERFORMING ORGANIZATION 65 OFFICE SYMBOL J)a NAME OF MONITORING ORGANIZATION 


val Postgraduate School Code Sü 





Naval Postgraduate School 





iDORESS (Cty, Stato. and ZIP Code) Tb. AQORESS (City, $tate, and ZIP Cooe) 

nterey, California 93943-5000 Monterey, Cali? 

NAME OF FUNDING: SPONSORING 8b OFFICE SYMBOL 9 PROCUREMENT INSTRUMENT IDENTIFICATION NUMBÉR 
IRGANIZATION (if sppic able) 

ADOSESS com, Sesce, seve 2/9 Coge) 10 TQURCE DE SUINDING NUMBERS i 


PROGRAM PROJECT rasx 
ELEMENT NO NO NO 


WORK car 
ACCESSION NO 








TE "^ciuge security Clasurcationt ae 
TlCaL uASEen TECHNOLOGY, SPECIFICALEY CD-ROMNC ANDRITS APPLICATION TO TER 
ORAGE AND RETRIEVAL OF INFORMATION (u) 


"PERSONAL AUTHORIS} 


nd, David J. 


PYOE OF REPORT 13D TIME COVEREO 14 OAIE OF REPORT (Year, Manta Day) |1S PAGE COUNT 
ster's Thesis FROM ro 1987 June 


UPPLENENTARY NOTATION 


COSATI CODES 
UE) GROUP | $U8-GROUP 
| | 


i 


18 SUBIECT TERMAS (Continue on reverse if necessary and identify dy OlOCk numoer) 


CD-ROM; CD ROM; CDROM; Ontical Laser Disc/Disk; 
Information Storage;. Information Retrieval 






BSTRACT (Continue on reverse if necessary and scentty dy dock numoer) 


One of the significant problems of this "information age" is the produc- 
on of vast amounts of information in a form that is neither convenient 

r cost effective] This information is most often produced and d$ ps NN 
ed on paper and the resuitant effort in production, distribution and 
trieval is nerculean à Dossiíci2 solution 20 this, is che negada 


— + O rula 1 
; DECIDES D o vs £ 
x use in ne storage ana retrieva: JI Larz moune 
la, ~ 


Sem TO conos Diana 


o 

> 
ww 

co 


“o 
intormasionm. Tarouen the use of Cais Secanology in che nona 
sas or the Department of Defense the effort in all three areas can be 
satly reduced and the end user can become more efficient. In many areas 
DOD, the greatest benefit would be the regained space and weight asso- 
ated with the distribution of the manuals and other typically paper 
oducts On a Compact Disc =- Read Only Memory (CD-ROM). One Z2-a30M weizzs 


-ňa 


"hn 


STRIIUTION/ AVAILABILITY OF ABSTRACT 21 ABSTRACT SECURITY CLASSIFICATION 
CQICLASSIFIEOAUNUMITEO C SAME AS APT (Some useas | Janlacsr' aa 

(AME OF RESPONSIBLE NOIVIOUAL 220 [ELEPMONE (inctuge area Cooe) | 22€. 2FFICE $v M8OL 
im EY (208) 546-202U Code SUP 

IRM 1473, 34 M An 83 APR edition may Oe@ used unti! ex^austed CECUAITO ELA EIE - C Sem 


^it otner eqitiont ace OO10!ere "o qn awe wee 
1 VINU JASO La iS 





62 


approved for public release: distribution is unlimited. 


OmEicai Laser Jecanolosy; Soect:  caily CJ-20M, and. Its 
Application to the Storage and Retrieval of Information 


by 


David J. Lind 
Lieutenant Commander, United States Navy 
B.S., United States Naval Academy, 1972 


Suomitted in partial Julf!liment of the 
requirements for the degree of 


MASTER OF SCIENCZ IN INFORMATION SYSTEMS 


from the 
NAVAL POSTGRADUATE SCHOOL 
June 1987 


—_ 


63 


ABSTRACT 


One of the significant problems of this 
"information age” is the production of vast amounts of 
information in a form that is neither convenient nor 
cost effective. This information is most often 
produced and distributed on paper and the resultant 
effort in production, distribution and retrieval is 
herculean. A possible solution to this, is the new 
optical laser technology and its use in the storage and 
retrieval of large amounts or information. Through the 
use of this technology in the non-classified areas of 
the Department or Derense the effort in all three areas 
can be greatly reduced and the end user can become more 
efficient. In many areas of DOD, the greatest benefit 
would be the regained space and weight associated with 
the distribution of the manuals and other typically 
paper products on a Compact Disc - Read Only Memory 
(CD-ROM). One CD-ROM weighs less than an ounce and is 
capable of storing over 270,000 pages of text. The 
saved shipping and handling costs alone would be 
astronomically reduced not to mention the end user who 
would have a more erfective and erficitent product. The 
CD-ROM is designed to work as a peripheral device to a 
microcomputer and can therefore be made available to 


any user with an IBM compatible microcomputer. The 


64 


PRODUCTION STAGES (DESIGN TO DELIVERY--THE 
GENERIC PAN oe ee : 


* . e e . e e e * 


Ao CONVERSION MME ms: 
B. STEPS IN DEVELOPING A C2-ROM PRODUCT 
1. User Requirement Definition . . . 


2. Delivery System Definition . . . .. 


3. Data Collection . . . . e > . : 

4. Data Conversion . . . +. . + a 

5 Data Indexing TM. ss a 

6. Loglical Formatiing . . =. e Si ee 

m premasterina Msn: A 
8. Mastering . . . E” 


APPLICATION DEMONSTRATION (DESIGN TO DELIVERY-- 
ADS PECITFRICIEXAMPLE as e 


À. SPONSOR e * * * * e e 


DATA SOURCE . . +. + 


HARDWARE . . . e e e e + e e e * 
SE SOFTWARE a. 


. CD-ROM . . . . 6 . o 


CONCLUSION 


RECOMMENDATION 


7 


58 


60 
60 
62 


2. 
3. 
4. 


6. 
as 


8. 


LIST OF TABLES 


Physical Characteristics Comparíson Of 


Optical Media . . . . . + 


Summary Of CD-ROM Capacity Equivalents 


CD-ROM Advantages And Disadvantages 


Comparative Access Speeds 


CD-ROM Characteristics Summary 
Naval Supply Centers And Depots 


Clasix DataDrive Performance Characteristics 


Cost Of TLOCD Application 


66 


e. 


e 


e 


a 
24 
26 
27 
28 
55 
57 


29 





r: INTRODUCTION/ BACKGROUND 


A. GENERAL 

The information age is upon us. It was reported 
that in 1985 the number of pages of printouts exceeded 
2,000 for every man, woman, and child in America. 
[Ref. 1] What will we do, especially in the military, 
to meet this new era with the resources at hand? We 
cannot arford to be lert behind, whether by technology 
or techniques. Some contemporaries have described it 
as an information explosion, and yet, an explosion is a 
singular, albeit powerful, event. The ground swell of 
this event is better described as a snowball rolled 
from the peak of the highest mountain. As it tumbles 
downward, it continues to increase it’s momentum as it 
picks up more snow and velocity along it’s path. From 
our vantage point, the slope is infinite, and although 
minor obstacles may be met along the way, it will 


continue on and on. 


Be OBJECTIVE 
The objective of this thesis is actually three- 
fold. First, the current technological capabilities in 


the area of optical laser research, as they apply to 


1i 


67 


not without cost and a sponsor was sought. The 
purchase of a CD-ROM disc drive, associated hardware 
and software and the cost of the services to index and 
master the discs, were the major costs. Naval Supply 
Systems Command in Washington, D.C. identified a need, 
a prototype application and provided support funding 
for this project. The hardware was purchased and the 
complicated process of data formatting and transfer to 
disc was accomplished. The actual object of the 
resultant dezczetratiíion was a portion of an extremely 
large database consisting of over 3 gigabytes (gbytes) 
of information composed of over 12 million total 
records. The prototype application dealt with 
approximately 360 Mbytes and slightly more than 2 
million records. Although a single CD-ROM can hold up 
to 540 MB, the total quantity of actual data held on a 
disc is often much less due to the indexing 
requirements.  Sophisticated indexing schemes can even 
require more space than the data itself. An example of 
this is Grolier Electronic Encyclopedia which requires 
60 MB to accommodate the actual text of the 
encyclopedia and 50 MB to accommodate the soohisticated 
index. (See Figure 1) 

The desired result of the research was to free up 
the two large Transaction Ledger On Disc (TLOD) disc 


packs each containing approximately 540 Mbytes of data 


13 


68 





INFORMATION A UA URU RSA PA ER à 
SURFACZ à / = 
p EON OBJECTIVE LENS 






PHOTODETECTOR 


SOURCE: 


CD-ROM The New Paovrus. p. 59. 


Figure 2. Optical Head Of A CD-ROM Drive 


OPTICAL STORAGE - METHODS AND VARIETIES 


OPTICAL DISCS 


RENO ON USER RECORDABLE 


a 
DIGITAL ANALOG WRITE ONCz MULTIPLE WRITE 
(Aucio/Data*) (Video) (Non-¿rasaoie! (ErasaDIe) 


"CD-ROM 





Figure 3. Optical Storage--Methods And Varieties 


17 


69 


adjacent nonreflective pits is called a land and can 
also vary in its representation of data from 2 to many 
bits. In CD-ROM coding, a binary one is represented by 
the transition from pit to land and land to pit. and 2 
or more zeros are represented by the distance between 
transitions. (See Figure 4) The resultant series 
lands and grooves are ultimately interpreted as one's 
and zers’s anda thus a wide variety of digitally encoded 
information can be stored on disc. When "reading" an 
optical disc, a low-sewer laser. senses, the presence 
or absence or the lands and grooves by means of 
reflected light energy. The small laser beam used to 
read back data is reflected from the lands, and 
scattered by the pits. 

Of the prerecorded discs, the CD-ROM is the most 
common and draws heavily on it’s predecessor, the CD- 
Audio Disc, for format, wide acceptance and 
manufacturing facilities. The recording format is a 
spiral groove approximately 3 miles long with a 
capacity of 540 MB. The tracking is maintained via the 
constant linear velocity (CLV) technique which requires 
variation of the disc rotation speed based on the 
distance of the read head form the center of the disc. 
The prerecorded disc is 4.72 inches in diameter and 
it’s uses are primarily in the area of database 


distribution and permanent archival of vast amounts of 


18 


70 


records. The other type of prerecorded optical disc is 
the Optical Read Only Memory (OROM) wnich is slightly 
larger than the CD-ROM. The OROM discs are generally 
5.24 inches in diameter and may be formatted with 
either concentric or spiral tracks. Although the 
capacity of the discs is very similar, OROM is often 
operated in a constant angular velocity (CAV) mode, 
thus allowing for faster access times. The typical 5 
1/4" floppy disc used the CAV technique. Also, OROM 
may be two sided. The predominance of the CD-ROM is 
most probably due to its similarity to the large CD- 
Audio market, and the fact that CD-ROM is the only form 
of optical-recording that, as of this writing, has an 
established standard. The OROM is not expected to make 
a significant impact in the near future and indeed may 
be subsumed by the more dominant forms of optical- 
recording. OROM will therefore not be further 
addressed in this paper. 

Of the two types of recordable discs, the WORM 
generally uses the CAV technique and the erasable disc 
technology is curren 
tly experimenting with both techniques without. a clear 
winner yet identified. Some or the varying physical 


characteristics can be seen in tie following table. 


20 


T 


APPENDIX B 


RESULTS USING AN OPTICAL CHARACTER READER 


This appendix includes the pages contained in Appendix 


A, scanned through a Kurzweil 5000 Intelligent Charater 
Recognition scanner. These pages are unedited to show 
character recognition and formatting errors. Page breaks 


were entered to help clarify the text that was scanned. 


p 


PCAC LASER TECHNOLOGY, SPECIFICALLY CD-ROM, AND ITS 
APPLICATION TO THE STORAGE 
— AND RETRIEVAL OF INFORMATION 
by 
David 3. Lind 


Urune lo-7 


Thesis Advisor:Barry Frew 


Approved for public release; distribution is unlimited 


us 


UNCLASSIFIED 
SECURITY CLASSIFICATION OF THIS PAGE 


REPORT OOCUMENTATION PAGE 
la REPORT SECURITY CLASSIFICATION10. RESTRICTIVE MARKINGS 


Unclassif *ed 
2a SECURITY CLASSIFICATION AUTHORITY3 DISTRIBUTION: 
AVAILABILITY OF REPORT 


26.DECLASSIFICATION I DOWNGRADING SCHEDULEApproved for 
public release; 


Distribution is Unlim © 
4 PERFORMING ORGANIZATION REPORT NUMBER(S)5 MONITORING 
ORGANIZATION REPORT NUMBER(S) 


5a. NAME OF PERFORMING ORGANIZATION 60 OFFICE SYMBOL 7a NAME 
OF MONITORING ORGANIZATION 


Naval Postgraduate School (If apphcable) 
Code 54Naval Postgraduate School 


6c. ADDRESS (City, State, anIII ZIP CooIe)lb. ADDRESS (Clry 
State, a- ZIP CocIe) 


Monterey, California 93943-5000Monterey, California 93943- 
5000 


8a NAME OF FUNDING/SPONSORINGBO.OFFICE SYMBOL9. PROCUREMENT 
INSTRUMENT IDENTIFICATION NUMBER 
ORGANIZATION(If applicable) 


Bc. ADDRESS (cry, State, a- 2'JPC'o0e)10 SOURCE OF FUNDING 
NUMBERS 


74 


PROGRAMPROjECTTASKWORK UNIT 
ELEMENTNONONOACCESSION NO 


1 1 TITLE (Include Securi Clawfica Ct on 


OPTICAL LASER TEC'HNObOGY, SPECIFICALLY CD-ROM, AND ITS 
APPLICATION TO THE STORAGE AND RETRIEVAL OF INFORMATION (u) 


12 PERSONAL AUTHOR(S) 
Lind-, David 3. 


' 3a TYPE OF REPORT130 TIME COVERED14 DATE OF REPORT (Year, 
Month, Day) IS PAGE COUNT 


Master’s ThesisFROM TO1987 June 


6 SUPPLEMENTARY NOTATION 


I 7COSATI CODES10 SUBJECT TERMS (Continue on reverte if 
necevary and identily by block number) 


F:ELDGROUPSUBGROUPCD-ROM; CD ROM; CDROM; Optical Laser 
base/ Disk; 
Information Storage;. Information Retrieval 


‘9 ‘ABSTRACT (Continue on reverte if necetiry and identily 
by block number) 


One of the significant problems of this "information age" is 
the production of vast amounts of information in a form that 
is neither convenient nor cost effective. This information 
is most often produced and distributed on paper and the 
resultant effort in production, distribution and retrieval 
is herculean. A possible solution to this, is the ne'- 
optical laser technology and its use in the storage and 
retrieval of' large amounts of information. Through the use 
of this technology in the non-classified areas of the 
Department of Defense the effort in all three areas can be 


US 


greatly reduced and the end user can become more efficient. 
In many areas '--f DOD, the greatest benefit would be the 
re.gained space and weight associated with the distribution 
of the manuals and othe.r typically paper products on a 
Compact Disc - Read Only Memory (CD-ROM). One CD-ROM weighs 


20 D-S'.RI3UT10N /AVAILABILITY OF ABSTRACT 21 ABSTRACT 
SECURITY 
CLASSIFICATION 


UNCLA5SIFIEDryNLIMITEDO SAME AS RpTO DTIC USERS Unc las @s5 
fied 


22a NAME OF RESPONSIBLE INDIVIDUAL220 TELEPHONEInclude22c 
OFEICE SYMBOL 


Barry Frew(408)eaCode Code 54Fw 
00 FORM 1473, 84 MARO3 APR edition may be used until 


exhaustedSECURITY CLASSIFiCATION OF 'NIS PAGE 


All other editions are oblsolete UNCLASSIFIED 


mo 


Approved for public release; distribution Is unlimited. 


Optical Laser Technology, Specifically CD-ROM, and Its 
Application to the Storage and Retrieval of Information 


by 


Dawid. Jo Lind 
Lieutenant Commander, Vnited States Navy 
B.S., Dnited States Naval Academy, 1972 


Submitted in PartIal fulfillment of the- 
requirements for the degree of 


MASTER OF SCIENCE IN INFORMATION SYSTEMS 


from the 
NAVAL POSTGRADDATE SCSOOL 
June 1987 


Author: 


Approved by: 
Ba y Frew, Thesis Advisor 


ng Chou, Second Reader 


Willis ~eer, Jr., halrman Department of Administrative 
Sciences 


17 


-Kneale T. Marshall Dean of Information and Policy Sciences 


78 


IV. PRODUCTION STAGES (DESIGN TO DELIVERY--TEE GENERIC 
EE o osos x Ge E AIR RR de T TS SE 47 


A. CONVERSION47 

B.STEPS IN DEVELOPING A CD-ROM PRODUCT49 
1. User Requirement Definition49 

2. Delivery System Definition49 

3. Data Collection49 

4. Data Conversion50 

5. Data Indexing50 

MR Logical Formatting-.52 

7. Premastering52 


8. Mastering52 


V.APPLICATION DEMONSTRATION (DESIGNTODELIVERY-- 
A SPECIFIC EXAMPLE)S54 


A. SPONSOR54 
BEDATA SOURCES 
C.EARDWARE56 
D.SOFTWARE56 
Eee ROMS 7 


ERCOSTSS 


79 


VI .SUMMARY60 


A. CONCLUSION60 


B. RECOMMENDATION62 





80 


LISL OE TABLES 


l.Physical Characteristics Comparison Of 
Optical Media21 


2. Summary Of CD-ROM Capacity Equivalents4 

3.- CD-ROM Advantages And Disadvantages26 

4. Comparative Access Speeds27 

5. CD-ROM Characteristics Summary8 

6. Naval Supply Centers And Depots55 

7. Clasix DataDrive Performance Characteristics57 


BEENCOSt Of TLOCD Application59 


81 








I.  INTRODUCTION/BACKGROUND 


A.GENERAL 

The information age is upon us. It was reported that in 
1985 the number of pages of printouts exceeded 2,000 for 
every man, woman, and child in America. [Ref. 1] What will 
we do, especially in the military, to- meet this new era 
with the resources at hand? We cannot afford to be left 
behind,  whether- by technology or techniques. Some 
contemporaries  have- described it as an information 
explosion, and yet, an explosion is a singular, albeit 
powerful, event. The ground swell of this event is better 
described a- a snowball rolled from the peak of the highest 
mountain. As it tumbles downward, it continues to increase 
it’s momentum as it picks up more snow and velocity along 
it's path. From our vantage point, the slope is infinite, 
and although minor obstacles may---be met along the way, it 
will continue on and on. 


B.OBJECTIVE 

The objective of this thesis is actually threefold. First, 
the current technological capabilities in the area of 
optical laser research, as they apply to 


Pr 


83 


not without cost and a sponsor was sought. The purchase of 
a CD-ROM disc drive, associated hardware and software and 
the cost of the services to index and master the discs, were 
the major costs. Naval Supply Systems Command in 
Washington, D.C. identified a need, a prototype application 
and provided support funding for this stones ' The 
hardware was purchased and the complicated process of data 
formatting and transfer to disc was accomplished. The 
actual object of the resultant demonstration was a portion 
of an extremely large database consisting of over 3 
gigabytes (gbytes) of information composed of over 12 
million total records. The prototype application dealt 
with-approximately 360 Mbytes and slightly more than 2 
Million records. Although a single CD-ROM can hold up to 
540 MB, tee total quantity of actual data held on a disc is 
often much less due to the indexing requirements. 
Sophisticated indexing schemes can even require more space 
than the data itself. An example of this is Grolier 
Electronic Encyclopedia which requires 60 MB to accommodate 
the actual text of the encyclopedia and SO MB to accommodate 
the sophisticated index. (See Figure 1) 

The desired result of the research was to free up the two 
large Transaction Ledger On Disc (TLOD) disc packs each 
containing approximately 540 Mbytes of data 


ES 


84 


adjacent nonreflective pits is called a land and can also 
Ln its representation of data from 2 to many bits. In 
CD-ROM coding, a binary one is represented by the transition 
from pit to land and land to pit, and 2 or more zeros are 
represented by the distance between transitions. (See 
Figure 4) The resultant series lands and grooves are 
ultimately interpreted as one’s and zero’s and thus a wide 
variety of digitally encoded information can be stored on 
disc. When "reading" an optical disc, a low-power laser,- 
senses, the presence or absence of the lands and grooves by 
means of reflected light energy. The small laser beam used 
to read back data is reflected from the lands, and scattered 
by the pits. 

Of the prerecorded discs, the CD-ROM is the most common and 
draws heavily on it’s predecessor, the CDAudio Disc, for 
format, wide acceptance and manufacturing facilities. The 
recording format is a spiral groove approximately 3 miles 
long with a capacity of 540 MB.. The tracking is maintained 
via the constant linear velocity (CLV) technique which 
requires variation of the disc rotation-. speed. based on- 
the distance of the read head form the center of the disc. 
The prerecorded disc is 4.72 inches in diameter and it’s 
uses are primarily in the area of database distribution and 
permanent archival of vast amounts of 


18 


So 


records. The other type of prerecorded optical disc isthe 
Optical Read Only Memory (OROM) which is slightly larger 
than the CD-.ROM. The OROM discs are generally 5.24 inches 
in diameter and may be formatted with either concentric or 
spiral tracks. Although the capacity of the discs is very 
similar, OROM is often operated in a constant angular 
velocity (CAV) mode, thus allowing for faster access times. 
The typical 5 1/4" floppy disc used the CAV technique. 
Also, OROM may be two sided. The. predominance of the CD- 
ROM is most probably due to its- similarity to the large 
CDAudio market,. and the fact that CD-ROM is the only form 
of optical-recording that, as of this writing, has an 

- established standard. The OROM is not expected to make a 


significant 
impact in the near future and indeed may be subsumed by the 
more dominant forms of opticalrecording.  OROM will 


therefore not be further addressed in this paper. 

Of the two types of recordable discs, the WORM generally 
uses the CAV technique and the erasable disc technology is 
curren tly experimenting'- with both techniques without a- 
Clear winner yet identified. Some of the varying physical 
characteristics can be seen in the following table. 


20 


86 


APPENDIX C 
RESULTS USING AN IMAGE SCANNER 


Appendix C contains the cover page, pages 1, 4, and 17 
contained in Appendix A. The first four pages were scanned 


at 200 and the next four pages were scanned at 300 PPI. 


87 


NAVAL POSTGRADUATE SCHOOL 


Monterey, California 





THESIS 


OPTICAL LASER TECHNOLOGY, SPECIFICALLY 
CD-ROM, AND ITS APPLICATION TO THE STORAGE 
- AND RETRIEVAL OF INFORMATION 


by 
David J. Lind 


June 1987 


Thesis Advisor: Barry Frew 





Approved for public release; distribution is unlimited 


88 


UNCLASSIFIED 
UKT CLASUFICATION OF Yue Bact 


REPORT DOCUMENTATION PAGE 
p S V 
" 12291742652 


Za SECURITY CLASSUCATION AUTHORITY 3 DISTRIBUTION) AVAMASIUITY Of REPORT 
20 O€CLASSIFICA TION / OOWNGAAOING SCHEDULÉ Approved for public release; 
Nistrthyeia le Tn14mirar 


| — 
130 MAME Of PERFORMING ORGANIZATION 65 OFFICE $Y^80L Ja NAME O^ MONITORING ORGANIZATION 
! (M somicadio) . 
Naval Postgraduate School! code 54 Naval Postgraduate School 
“bc. ADORESS (City, Store, ant J Code) la. ADDRESS (Oty, State, and ZI? Cove) 
Monterey, California 93943-5000 Monterey, California 93943-5000 











da NAME O* FUNOING/SPONSORNG ba OFFICE SYMBOL 9 PROCUREMENT INSTRUMENT (IDENTIFICATION NUMBER 
ORGANIZATION 0f appia sote) 





Bc aODRESS (Grty, State, onc TI? Cove) 10 SOUACE Of FUNDING NUMBERS 


PROGRAM PROJECT Tas« WORK UNIT 
ELEMENT NO NO NO ACCESSION NO 


" mY TA Mem WECUNOL E 
OPTICAL LASER TECHNOLOGY, SPECIFICALLY CD-ROM, AND ITS APPLICATION TO THE 
STORAGE AND RETRIEVAL OF INFORMATION (u) 


12 PERSONAL AUTHOR(S) sme 


Lind, David J. 


-ja TYPE OF REPORT 1)b TIME COVEREO 14 DATE OS REPORT (Year. Month. Day) NS PAGE COUNT 
Master's Thesis FROM ro 1987 June 


"6 SUPPLEMENTARY NOTATION 


COSA! CODES 


ELO | GROUP | SUB-GROUP 


i 
> ABSTRACT (Cantrwe on reverso if necessary end Gentify by biork number) 


One of the significant problems of this "information age" is the produc- 
zion of vast amounts of information in a form that is neither convenient 
ior cost effective. This information is most often produced and distrib- 
ited on paper and the resultant effort in production, distribution and 
“etrieval is herculean. A possible solution to this, is the new optical 
-aser tecnnology and its use in the storage and retrieval of larze amounts 
£ information. Through the use of this technology in the non-ciassified 
reas of the Department of Defense the effort in all three areas can be 
‘reatly reduced and the end user can become more efficient. In many areas 
f DOD, the greatest benefit would be the regained space and weight asso- 
ated with the distribution of the manuals and other typically paper 
roducts on a Compact Disc - Read Only Memory (CD-ROM). One CD-ROM weighs 


99 RISUTION ¡AVAJCABICITY OS ABSTRACT 21 ABSTRACT SECURITY CLASSUICATION 
DencrasseereonnumiteD C) same as rer Clone users | Unclassfied 

“AME OF AESPOANSIBLE INOIVIOUAL 220 TELEPHONE (ciugo Area Coge) | 22c OFFICE SYMBOL 
Sarry Frew (408) 646-2928 | Code S4Fw 





18 SUBJECT TERMS (Contmue on revere if necetiery ang identify dy block number) 


CD-ROM; CD ROM; CDROM; Optical Laser Disc/Disk; 
Information Storage; Information Retrieval 






rORM 1473, 44 MAR 8) APR emon may de ued until exhausted SECURITY CLASSIFICATION OF * «i5 PAGE 
Al! other ed:t.O^1 470 O010/ete UNCLASSIFIED 
1 


89 


ABSTRACT 


One of the significent problems of this 
"information age" 1s the production of vast amounts of 
information in a form that is neither convenient nor 
cost effective. This information is most often 
produced and distributed on paper and the resultant 
effort in production, distribution and retrieval is 
herculean. A possible solution to this, is the new 
optical laser technology and its use in the storage and 
retrieval of large amounts of information. Through the 
use of this technology in the non-classified areas of 
the Department of Defense the effort in all three areas 
can be greatly reduced and the end user can become more 
efficient. In many areas of DOD, the greatest benefit 
would be the regained space and weight associated with 
the distribution of the manuals and other typically 
paper products on a Compact Disc - Read Only Memory 
(CD-ROM). One CD-ROM weighs less than an ounce and is 
capable of storing over 270,000 pages of text. The 
saved shipping and handling costs alone would be 
astronomically reduced not to mention the end user who 
would have a more effective and efficient product. The 
CD-ROM is designed to work as a peripheral device to a 
microcomputer and can therefore be made available to 


any user with an IBM compatible microcomputer. The 


90 





Figure 2. Optical Hesd Of A CD-ROM Drive 


OPTICAL STORAGE - METHODS AND VARIETIES 


OPTICAL DISCS 


PRE-RECORDED] 
(REAO ONLY) USER RECORDABLE 


DIGITAL ANALOG WRITE ONCE MULTIPLE WRITE 
(Audio/Data*) || (Video) (Non-Erasable) (Erasable) 


a CD-ROM 





Figure 3. Optical Storage--Methods ànd Varieties 


17 


2 


NAVAL POSTGRADUATE SCHOOL 


Monterey, California 





LHES 


OPTICAL LASER TECHNOLOGY, SPECIFICALLY 
CD-ROM, AND ITS APPLICATION TO THE STORAGE 
AND RETRIEVAL OF INFORMATION 


by 
David J. Lind 
June 1987 


Thesis Advisor: Barry Frew 





Approved for public release; distribution is unlimited 


92 


UNCLASSIFIED 


UliT Y CLA vt 4 


REPORT DOCUMENTATION PAGE 


le REPORT SECURITY CLASSIFICATION Te MESTAICTIVE MARKINGS i 
no 1 1a 














la SECURITY CLASSIFICATION AUTHORITY 3 DISTRIGUTION/ AVASLABILITY OF AEPORT 

p DECLASSIFICATION / DOWNGRADING SCHEOULE Approved PA public release; 

Ds ] 

e 

ija NAME OF PERFORMING ORGANIZATION 6o OFFICE SYMBOL | 78 NAME OF MONITORING ORGANIZATION 
(If opgiccobie) . 

t 
| Naval Postgraduate School) code 54 Naval Postgraduate School 
- €. ADORESS (Gty, Stare, and D Cove) Te ADORESS (Cry, State, and ZIP Code) 

Monterey, California 93943-5000 Monterey, California 93943-5000 

Ba NAME OF FUNDING / SPONSORING 60 OFHCE SYMBOL | 9 PROCUREMENT INS TRLIMENT INEM TIEICATION NUMBER 

ORGANIZATION ( sept adie) 

àc QDORESS (City, State, end 21º Code) 10 SOURCE OF FUNDING NUMBERS 
E PROGRAM PROJECT TASK WORK NI! 
ELEMENT NO | NO NO ACCESSION NO 
| 
^; tir et, £g f 

OPTICA ASER WE CHNOLOGY, SPECIFICALLY CD-ROM, AND ITS APPLICATION TO THE 


E e RETRIEVAL OF INFORMATION (u) 


12 PERSONAL AUTHORIS} 
Cind, David J. 


“Jo TYPE OF REPORT 198 TIME COVERED tá OATE OF REPORT (Year, Month. Day) 15. PAGE COUNT 
Master's Thesis FROM "STO 1987 June 


'6 SUPPLEMENTARY NOTATION 


t COSAT! CODES 18 SUBIECT TERMAS (Continue on revente tf neceiiery and emily dy Doca mumber) 


rto | Gaour |  SU*G^OU — | CD-ROM; CD ROM; CDROM; Optical Laser Disc/Disk; 


T Information Storage; Information Retrieval 
NM ANN US PM 


3 ABSTRACT (Caonumnve on revene if necessary end wentty by boca number) 


One of the significant problems of this "information age" is the produc- 
sion of vast amounts of information in a form that is neither convenient 
nor cost effective. This information is most often produced and distrib- 
uted on paper and the resultant effort in production, distribution and 
retrieval is herculean. A possible solution to this, is the new optical 
aser technology and its use in the storage and retrieval of large amounts 
f information. Through the use of this technology in the non-classilied 
areas of the Department of Defense the effort ín all three areas can be 
zreatly reduced and the end user can become more efficient. In many areas 
>f DOD, the greatest benefit would be the regained space and weight asso- 
slated with the distribution of the manuals and other typically paper 
oroducts on a Compact Disc - Read Only Memory (CD-ROM). One CD-ROM weighs 


O $7 15UTION / AVAILABAITY OF ABSTRACT 21 ABSTRACT SECURITY CLASSIFICATION 
TO UNCUASSIFIEOANLIMITEO O same as apt C OTIC USERS Unclassfied 












a NAME DF RESPONSIBLE INDIVIDUAL 220 TELEPHONE (/^civoe Area Cooe) | 2c OFFICE SYMBOL 
Barry Prew (408) 646-2024 
+ FORM 1473, 84 maa 8) APR egrtion may de used unti ennausted SECURITY CLASSIFICATION Of. "«$ PAGE 
1 


93 


ABSTRACT 


One of the significant problems of this 
"information age" is the production of vast amounts of 
information in a form that is neither convenient nor 
cost effective. This information is most often 
produced and distributed on paper and the resultant 
effort in production, distribution and retrieval is 
herculean. A possible solution to this, is the ney 
optical laser technology and its use in the storage and 
retrieval of large amounts of information. Through the 
use of this technology in the non-classified areas of 
the Department of Defense the effort in all three areas 
can be greatly reduced and the end user can become nore 
efficient. In many areas of DOD, the greatest benefit 
would be the regained space and weight associated with 
the distribution of the manuals and other typically 
paper products on a Compact Disc - Read Only Memory 
(CD-ROM). One CD-ROM weighs less than an ounce and is 
capable of storing over 270,000 pages of text. The 
saved shipping and handling costs alone would be 
astronomically reduced not to mention the end user who 
would have a more effective and efficient product. The 
CD-ROM is designed to work as a peripheral device to a 
microcomputer and can therefore be made available to 


any user with an IBM compatible microcomputer. The 


94 


CD-ROM The New Papyrus. 5. $9. 





Figure 2. Optical Head Of A CD-ROM Drive 


OPTICAL STORAGE - METHODS AND VARIETIES 


OPTICAL DISCS 








PRE-RECORDED 
(READ ONLY) 


N 


ANALOG 
(Video) 










USER RECORDABLE 


WRITE ONCE MULTIPLE WRITE 
(Non-Erasable) (Erasabie) 


















DIGITAL 
(Audio/Data *) 


EE 


Figure 3. Optical Storage--Methods And Varieties 


17 


95 


GLOSSARY 


Analog--Analog data is a representation of information by a 
signal that varies in proportion to the amount of the 
original information. Thus, the size of a signal, such as 
light, is expressed by another signal, an electrical 
voltage, that is proportional to the amount of light 
reflected. 


ANSI--American National Standards Institute 


Application Development--Customer software developed 
according to the user's specification that can include user 
interface, data presentation and integration of the 
information product into existing applications. 


ASCII--American Standard Code for Information Interchange. 
It is the standard table of 7-bit digital representations 
used to transmit information to a printer, other computers, 
or other peripheral devices. 


Binary--Binary data is a representation of numerical 
information that uses only two expressions. These are, 
numerically, the digits "1" and "0". or, electronicall mi 
and "off." Thus, the on/off representation allows 
electronic storage and manipulation of the information. 


Bit--Binary digit. The smallest part of information in 
binary notation. A bit is written as either 1 or 0 and 
represents either the on or off variation of voltage. 


Board--A printed-circuit board, or card, that mounts onto 
the physical chassis of a computer or peripheral and holds 
the chips and associated. wiring. Other cards may be 
plugged into this board. 


BPI--Bits per inch is usually used to describe the 


electronic representation on a video screen; a bit is 
frequently equivalent to a pixel. 


96 


Buffer--An auxiliary storage area for data. Many 
peripherals have buffers used to temporarily store data that 
will be used as time permits. 


Byte--A group of eight bits of digital data which is 
processed together. A byte can have 256 (or 28) possible 
comblinations of 8 binary digits. 


CAV--Constant Angular Velocity. A technique that spins a 
disc at a constant speed, resulting in the inner disc tracks 
passing the read/write head more slowly that the outer 
tracks. this results in numerous tracks forming concentric 
circles with the storage density being the greatest on the 
inner track. (See also CLV). 


CCD--Charged Coupled Device is a device composed of a row of 
Several thousand small photocells. Each pixel on the output 
image corresponds to a photocell. A CCD is actually a one- 
BRR microcircuit. 


CCITT--Acronym for the French name of the Consultive 
Committee on International Telephone and Telegraph. CCITT 
issues the standards for data compression techniques such as 
=E Croup 3. 


CD--Compact Disc - See CD-ROM 

CDI--Compact Disc Interactive. Physically identical to the 
CD-ROM disc, however, with emphasis on the interactive 
presentation of video, audio, text and data. A self- 
contained multimedia system expected to operate in 
conjunction with home entertainment equipment. 

CD ROM--See CD-ROM 

CDROM--See CD-ROM 

CD-ROM--Compact Disc - Read Only Memory. A computer 
peripheral capable of storing large amounts of data which 


are placed on the disc at the time of manufacture. 


Checksum--A method of checking the accuracy of a character 
transmitted, manipulated, or stored. The checksum is the 


97 


result of the summation of all the digits involved. Used 
for error detection vice error cerrece ici 


Chip--The term applied to an integrated circuit that 
contains many electronic circuits. A chip is sometimes 
called an IC or an IC chip. The name is occasionally 
applied to the entire integrated circuit package. 


CIRC--Cross-Interleaved Reed-Solomon Code. The only error 
correction scheme used with CD Audio, and the first layer 
used with CD-ROM. It is implemented in the hardware, and 
uses two independent R-S codes to achieve an error rate of 1 
uncorrected error per 109 bytes. 


CLV--Constant Linear Velocity (as opposed to CAV). 

Used with CD-ROM to keep the data moving past the optical 
head at a constant rate. In order to accomplish this, the 
rotational speed of the disc must vary, decreasing as the 
head moves from the inner tracks toward the outer perimeter. 
The range is approximately 500 to 200 rpm for a CD-ROM disc 
drive. 


Code--A method of representing data in a form the computer 
can understand and use. 


Command--A code that represents an instruction for the 
computer. 


CRC--Cyclic Redundancy Code. ECC algorithm for the checking 
of CD-ROM after error correction is performed--only capable 
of error detection. 


Density--The closeness of space distribution on a storage 
medium such as a disc. 


Digital--Digital data is a representation of information by 


numerals. Thus, the size of the electrical voltage is 
expressed as numbers: that is, in digits. 


DIS 


Disc Preparation--Providing certified tapes and shipping 
Eunbainers for customer data. Scanning input tapes for data 
integrity and cleaning up minor problems, building a 
EEEectory (High Sierra or customer), putting the data in 
proper format for the mastering center user. 


DPI--Dots per inch refer to the dots, or spots, of ink 
placed on paper by a printer; each may be composed of more 
than one pixel. 


DOS--See Disk Operating System. 


Double-Density--This term is most often applied to the 
storage characteristics of disks, and generally refers to 
the density of the storage of bits on the disk surface on 
each track. 


DRAW--Direct Read After Write. A write once optical disc 
technology (See also WORM), an error control technique; 
however, it is unable to be used with CD-ROM. 


EBCDIC--Extended Binary Coded Decimal Interchange Code. An 
8-bit code developed by IBM, and used primarily by IBM and 
its compatibles. The code is used to represent 256 numbers, 
letters and characters in a computer system. (See also 
ASCII) 


EEE Error Correction Coding. The application or addition 
of data to the original data in order to provide a means of 
correction when an error in the original data is detected. 
EDAC--Error Detection and Correction. Redundant information 
which is calculated according to certain algorithms used to 


detect and correct errors when data is read. 


EDC--Error Detection Code. The application of redundant 
data to the original data in order to detect errors. 


GB--See Gigabyte. 
Gbyte--See Gigabyte. 


Giga--1,000,000,000. 


99 


Gigabyte--1,000 megabytes, or 1 billion (109) bytes. 


Glass Master--The original glass disc upon which the digital 
information is burned with a laser. From it are formed the 
"stampers" which in turn are used to produce the numerous 
discs, usually by an injection molding process. 


Hardware--The physical computer and all of its component 
parts, as well as any peripherals and interconnecting 
cables. 


HeadCrash--When the read-head contacts the magnetic surface 
of the disk--a highly undesirable occurrence. 


High Sierra Group--An ad hoc working group of CD-ROM service 
companies, vendors, and manufacturers which has been a prime 
source of activity in the setting of standards for CD-ROM 
data format and compatibility. The group was named after 
its first meeting place--the High Sierra Hotel at Lake 
Tahoe. The group first met in 1985. 


IC-- Integrated Cirgeuiie 


Indexing--The actual processing of all records according to 
the layout and the building of the index file. Indexes 
permit the computer to rapidly locate data without searching 
through the full body of data. Generally, a data item is 
searchable only if it is indexed. 


Indexing Set Up--Tape handling, resource allocation and - 
loading the layout programs on the indexing system. 


Instruction--A program step that tells the computer what to 
do for a single operation in a program. 


Interface--A device that serves as a common boundary between 
two other devices, such as two computer systems or a 


computer and peripheral. 


Jewel Box--The plastic container in which the CD-ROM disc is 
generally stored. 


100 


Jukebox--See Optical Jukebox. 

K--Abbreviation for Kilo. 

KB--See Kilobyte. 

Kbyte--See Kilobyte. 

Kilo--A prefix meaning (1) 1000 when used in a mathematical 
expression; or (2) 1,024 210 when used as a unit measure in 
computers. As an example, 16K would equal 16 times 1,024 or 


16,384. 


Kilobyte--A unit of measure in computers that equals 1024 
bytes. 


LAN--Local Area Network. 

Land--The reflective area between two adjacent non- 
reflective pits on a disc. The transition from pit to land 
or land to pit represents a binary 1. (See also Run). 
M--Abbreviation for Mega. 

Magneto-Optic--A form of erasable media that stores 
information in the form of vertically oriented magnetic 
domains. 

Mastering--The entire process involving the scheduling of 
the mastering center, managing artwork and packaging issues 
and Q.A.ing all replicas for data integrity and readability. 
MB--See Megabyte. 

Mbyte--See Megabyte. 

Mega--1,000,000. 

Megabyte--1,000 Kilobytes, or 1 million (106) bytes. 

Metal Mother--The negative mold created from the glass 


master which is in turn used to stamp the numerous discs. 
Often called a "stamper". 


101 


Micron--One square micron, the area occupied by 1 bit on a 
CD-ROM. One millionth of a meter. 


Microsecond--One 1/1,000,000th of a second. - 
Millisecond--One 1/1,000th of a second. 
MO--See Magneto-Optic. 


MS-DOS--The disk operating system used with IBM computers 
and their compatibles. 


OCR--Optical Character Recognition. Generally used in 
reference to a device capable of scanning printed material 
into a digital form. 


ODS--Optical Digital Data Disc 


Optical Jukebox--A store and read mechanism capable of 
storing and accessing multiple CD-ROMs. Accessing is 
generally accomplished by mechanical means afterwhich the 
discs are placed on a single reader (disc drive) for use. 


OROM--Optical Read Only Memory 
Photoconverter--See CCD. 


Photocell--A photocell is an electronic component which 
changes a light signal into an electrical signal by 
photoelectric conversion. A photocell is only a few microns 
square. 


Pit--The microscopic depression in the reflective surface of 
a disc. The pattern of pits represents the data being 
Stored on the disc. (See also ""land''). The light from the 
laser used to read the data is reflected back from the 
lands, but scattered by the pits. A typical pit as about 
the size of a bacterium ~- 0.5 by 2. 0 microns: 


Pixel--A pixel (picture element) is the smallest 


controllable element of an image. As resolution (the number 
of pixels per inch) increases, pixel size decreases and 


102 


details are more accurately represented. Pixels are usually 
Square, but they may be rectangular or round. The shape is 
determined by the optical system of the device. 


PPI--Pixels per inch. 


Platter--Generally used in reference to the larger (12'") 
optical discs. Sometimes in reference to a single layer in 
a magnetic disc pack. 


RAM--Random Access Memory. Semiconductor memory circuits 
used to store data and programs in information processing 
Systems. 


Resolution--Resolution is defined as the number of pixels 
read or displayed per inch (PPI), both horizontally and 
vertically. 


R/W/E--/Read/Write/Erase--An alternative title for erasable 
discs. 


Run--The distance between transitions either from land to 
pit or pit to land. The distance represents two or more 
zeros (See also Land). 


SCSI--Small Computer Systems Interface--A complete 8bit 
parallel interface bus structure with rates up to 4 
Mbytes/sec. that is subordinate to the rest of-the system 
architecture. Up to 8 systems and peripherals may be 
connected to the same bus. 


Software--A general term that applies to any program (set of 
instructions) that can be loaded into a computer from any 
source. 

SPI--Spots per inch. See DPI. 


Stamper--See Metal Mother 


Substrate--The base material form which a disc is made, 
generally a strong and transparent polycarbonate plastic. 


WOS 


Tbyte--Terabyte or 1,000 gigabytes. (1012) 


Track--A linear, spiral or circular path on which 
information is placed, or found. The portion of a disk 
one read/write head passes over to extract data. Track 
density is measured in tpi (tracks per inch). 


WORM--Write Once Read Many (occasionally seen as Write Once 
Read "Mostly" or "Multiple"). 


104 


LIST OF REFERENCES 


Alter, Allan, "The Unpapering of America," CIO Magazine, v. 
TEN. 1, pp. 16-26, October 1988. 


"CD-ROMs: The Laser's Edge in Data Storage," Mechanical 
Engineering, pp. 50-55, April 1987. 


Chavez-Simmons, Martha, "Understanding OCR Technology," 
Micro User's Guide, pp. 18-23, Summer 1988. 


Dukeman, John, "Optical Disk - A Technology on the Move," 
Modern Office Technology, pp. 82-88, June 1988. 


Grigsby, Mason, "Optical Disk - Vision to Payoff," Modern 
Office Technology, pp. 60-68, November 1988. 


Kapoor, Ajit, "Electronic Imaging and Information 
Processing," The Office, p. 28, December 1988. 


Lambert, Steve, and Ropiequet, Suzanne, "CD ROM the New 
purus," Microsoft Press, p. 68, 1986. 


Lesher, Donald, "Making the World Safer Place for Trees," 
m "EMsgazine, v. 2, No. 1, p. 33, October 1988. 


Levine, Ron, "Optical Storage Comes of Age," Dec 
Professional, pp. 49-59, November 1988. 


Levy, I., "Controlling Paperwork with Image Processing," 
Government Technology, v. 1, No. 6, pp. 5-6, 
November/December 1988. 


Lind, David J., Optical Laser Technology, Specifically CD- 
ROM, and its Application to the Storage and Retrieval of 
Information, Master's Thesis, Naval Postgraduate School, 
Monterey, California, June 1987. 


Matlin, Mark, "Image Compression for Document Storage," ESD: 
The Electronic System Design Magazine, pp. 75-76, July 1988. 


105 


McNaul, Jim, "Scanners Bring Grayscale Processing to Desktop 
Publishing," Micro User's Guide) pp: 250530, Falsa 


Rosch, Winn, "Worms for Mass Storage," PC Magazine, ppa 
166, 23 June 1987. 


Imaging - Primer Series, Wang Laboratories, Inc., 1987. 


Xerox 7650 Pro Imager Reference Manual, Xerox, September 
1987. 


106 


BIBLIOGRAPHY 


Ambrosio, Johanna, "IRS Experiment Puts Tax Returns On 
Optical Disk," Government Computer News, pp. 39-40, 22 July 
1988. 


Alter, Allan, "Image Meets Reality," CIO Magazine, v. 2, No. 
Dp. 26-33, October 1988. 


Anderson, Julie, "Succeeding at CD-ROM," PC Tech Journal, 
BR O-10, October 1988. 


Black, Lauren, "Image Scanners," Info World, pp. 41-52, 11 
July 1988. 


Bridge, Raymond "Casting Databases into Plastic," CD-ROM 
Review, November/December 1987. 


Brindza, Stephen, "Managing Text at a Glance," Modern Office 
Technology, pp 102-104, May 1988. 


Brunet, James, "Scanners Cling To PC Software," Unix World, 
Epm/9—-86, October 1988. 


Chickering, John E., "The Advent of the Paperless Ship," 
Naval Engineers Journal, pp. 226-236, May 1988. 


"Colorado Starts Optic Conversion," Government Technology, 
v. 1, No. 6, p. 14, November/December 1988. 


Danca, Richard, "AGA Inc. Shows Rewritable Optical Disks for 
PCs," Government Computer News, p. 13, 22 July 1988. 


Devoy, Jim C., "Media, File Management Schemes Facilitate 
WORM Utilization," Computer Technology Review, p. 48-49, 
Fall 1988. 


Dinman, S.,  "Image/Text Scanners," Government Computer 
Ev. 7, pp.71-77, 22 July 1988. 


107 


Forbes, Jim, "AI Software Lets Optical Scanners Read Entire 
Page," PC Week, p. 6, 15 August 1988. 


Francis, B., "PC Back-up's Optical Understudy," Datamation, 
pp. 57-60, 15 December 1988. 


Fusco, Joseph, "Image Processors Need Faster Data Transfer," 
Computer Technology Review, pp. 20-21, June 1988. 


Goff, Leslie, "Kodak Slated To Add Bus to Optical Jukebox," 
Management Information Systems Week, p. 14, 19 September 
1938: 


Greenis, Mark, "Redefining the Scope of Office Automation," 
Government Technology, v. 1 No C PP o: 
November/December 1988. 


Haber, Lynn, "Optical-Disk Storage Market Just Keeps on 
Inching Along," Digital News, pp. 61-63, 19 September Tae 


Hecht, Jeff, "Lasers Store A Wealth of Data," High 
Technology, pp. 408-415, May/June 1982. 


Helgerson, Linda, "In Search of CD-ROM Data," PC Tech 
Journal, pp. 61=7S7 October 5 


Helgerson, Linda, "CD-ROM Publishing Strategies," PC Tech 
Journal, pp. 53-63, October SS. 


Helliwell, John, "Optical Overview: What’s Coming in CD- 
ROMs and WORMs," P C Magazine, pp. 149-164, 14 October 1986. 


Henricks, Mark, "WORM Squirms for a Place in the Sun," PC 
World, pp. 74-76, September 1988. 


Hosinski, Joan, "Agencies Urged to Ease Into Optical-Disk 
Applications," Government Computer News, p. 31, 19 December 
SS aa 


Hosinski, Joan, "Agencies Watch Progress of PTO's WORM 


System," Government Computer News, pp. 10-14, 9 January 
1989. 


108 


RSMStrom, D., "Optical Disks Find Increased Applications," 
Computer Technology Review, p. 30, November 1988. 


Kaplan, A., "Scanner Well Suited for Local Goverment," 
Government Technology, v. 1, No. 6, p. 33, November/December 
1988. 


Kellis, Mark, "CD-ROM Revolutionizes Information 
Distribution in the Federal Government," Micro User's Guide, 
Es 14-16, Fall 1988. 


Lacy, John, "The Benefits of Image-Based Document 
Management," The Office, pp. 132-137, January 1989. 


Langworthy, George, "Insights to Optical Storage," Storage, 
Pp. 13-14, May 1988. 


Levine, M., "Going Beyond Text to Images Is Next Step," 
Digital Review, pp. 59-61, 9 January 1989. 

"Mass Storage Options," Computer Buyer's Guide and Handbook, 
v. 4, pp. 14-15, June/July 1986. 

McCormick, John, "Suppliers Benchmark Claims Aren't Always 
Valid," Government Computer News, v. 7, No. 15, pp. 74-77, 


PD wwuly 1988. 


Miles, J.B., "Remembrance of Things to Come," Computer 
Decisions, pp. 105-107, 14 January 1986. 


Miyatake, Masaya, "Image Scanner Sales Promise to Double in 
1989," Computer Technology Review, pp. 22-25, November 1988. 


Moberg, Dick, "Videodiscs and Optical Data Storage," 
Byte Magazine, pp 400-405, June 1982. 


Mortensen, Erik, "The CD-ROM Debate: What is Its Potential 
Maite." The Office, pp. 71-72, May 1988. 


"Optical Data Storage: The State of the Art," Computer 
Buyer's Guide and Handbook, v. 4, pp. 12-13, June/July 1986. 


HOD 


Page, Bruce, "Lan Optical Disks," LAN Magazine, pp. 63-68, 
August 1988. 


Rangel, Diane, "Putting Texas on Disc," CD-ROM Review, pp. 
40-42, November/December 1987. 


Ross, David, "Omnifont Character Recognition Provides 
Flexible Data Handling," Computer Technology Review, Winter 
1987; 


Ruster, Allen, "Capacity Differences Complicate Optical Disk 
Choices," Government Computer News, p. 44, 22 July 1988. 


Schlosser, Phillip, "Electronic Paper Does Away With 
Keyboard," Computer Technology Review, p. 24, November 1988. 


Schnaidt, Patricia, "Worm Networks - Maimonides Hospital 
Reduces Paper With Optical Disks," LAN Magazine, pp.. 70-74, 
August 1988. 


Sehr, Barbara, "High Noon for CD-ROM," Datamation, pp. 79- 
88, 1 November 1986. 


Simmons, A., "The Paperless Office - Dream or Vision," 
Government Technology, v. 1, No. 6, pp. 1-6, 


November/December 1988. 


Simpson, David,  "Erasable Optical Disks: When, 
What...Why?," Mini-Micro Systems, pp. 42-60, December 1987. 


Smith, Steven, "The Final Steps: Look Before You Reap," CD- 
ROM Review, November/December 1987. 


Stanton, Tom,  "Page-To-Disk Technology," PC Magazine, pp. 
128-177, 30 September 1986. 


Stanton, Tom, "Scanners Take Off," PC Magazine, pp. ISS AMA 
13 October 1987. 


Stork, Carl, "CD-ROM: The New Distribution Medium," Micro 
User's Guide, pp. 25-28, Summer 1988. 


WLO 


Strukhoff, Roger, "Iris Eyes - Intermedia Prize," CD-ROM 
Review, pp. 18-27, May 1988. 


Sullivan, Judith, "IRS Pilots Optical Image Storage 
Retrieval Systems," Government Computer News, v. 5, No. 18, 
10 October 1986. 


Teja, Edward, "Object-orientation and Images: Keeping Up 
With Improving Output Devices," Hardcopy, p. 31, June 1988. 


Teja, Edward, "Move Down the Wire," Hardcopy, pp. 32-42, 
June 1988. 


Watson, Paula, "CDROM Catalogs-Evaluating LEPAC and Looking 
Ahead," Online, pp. 74-79, September 1987. 


Willett, Shawn, "Omnipage Makes Low Cost Desktop Scanning A 
Reality," Computer Technology Review, p. 24, November 1988. 


Wilson, A., "The Big Easy - Cramming Over 2000 Images on a 
Hard Disk," ESD: The Electronic System Design Magazine, p. 
24, September 1988. 


Wilson, A., "High-Resolution Scanners Come of Age," ESD: The 
Electronic System Design Magazine, pp. 17-18, October 1988. 


Worsley, Charles, "Reading and Writing and Discmatic," CD- 
ROM Review, pp 44-46, November/December 1987. 


II 


DISTRIBUTION LIST 
No. Copies 


Library, Code 01422 2 
Naval Postgraduate School 
Monterey, CA 93943-5002 


Defense Technical Information Center 2 
Cameron Station 
Alexandria, VA 22304-6145 


Chief of Naval Operations 1 
Director Information Systems (OP-945) 

Navy Department 

Washington, DC 20350-2000 


Director, Naval Data Automation Command il 
Washington Navy Yard 
Washington, DC 20374-1662 


Commanding Officer 1 
Navy Fleet Material Support Office 

P.O. Box 2010 (Attn: Code 9513) 

Mechanicsburg, PA 17055 


Barry A. Frew 6 
Administrative Sciences Department 

Code 54FW 

Naval Postgraduate School 

Monterey, CA 93943-5000 


Carl R. Jones E 
Administrative Sciences Department 

Code 74 

Naval Postgraduate School 

Monterey CAT93943- 3000 


LCDR David J. Lind USN il 
Naval Data Automation Command (Code 30) 

Washington Navy Yard 

Washington, DC 20374-1662 


I2 


DU. 


Ir. 


i. 


AS. 


14. 


LT Mike Pafford USN 

Commander 

Naval Security Group Support Activity (GX) 
3801 Nebraska Ave. N.W. 

Washington, DC 20393-5220 


MAJ Paul W. LeBlanc USMC 

Marine Corps Central Design & Programming Activity 
Marine Corp Combat Development Command 

Quantico, VA 22134 


LCDR Robert R. Taylor USN 
Patrol Squadron Seventeen 
FPO San Francisco 96601-5910 


MAJ Elbert T. Shaw USA 
1278 Spruance Rd. 
Monterey, CA 93940 


Office of the Secretary of Defense 
DEERS Support Office 

Attn: CAPT Dellaporta 

2511 Garden Road, Suite 260 
Monterey, CA 93940-5331 


Command Officer 

Fleet Material Support Office 
Attn: LCDR P.R. Richey 

5450 Carlisle Pike 
Mechanicsburg, PA 17055 


113 















Thesis 
2221525 
eal 


Tay Lor 

Conversion of hard-copy 
documents to digital 
format utilizing Optical 
scanners and Optical 
storage media. 





á 
p —— Lane 
É qe -. a 
y DE ant alien 


a 





